parsing complex web pages
jdhunter at ace.bsd.uchicago.edu
Thu Jun 19 05:41:12 CEST 2003
>>>>> "John" == John J Lee <jjl at pobox.com> writes:
John> If it works well for you, why not stick with it?
It does work well -- today I parsed about 1000 tickers worth of pages
w/o error. The main reason I was interested in a python solution is
if I ever decide to release my Yahoo.Finance modules. I have done
several of yahoo's pages now (research, profile, historical) and I
figure others may find this useful someday. I would rather send them
out as pure python than as 'requires lynx'. This isn't too big a
deal, though, since lynx runs on a lot of platforms.
It did cause me to wonder though, whether some good python html->text
converters which render the html as text (ie, preserve visual layout),
were lurking out their beneath my radar screen.
More information about the Python-list