parsing complex web pages

John Hunter jdhunter at ace.bsd.uchicago.edu
Thu Jun 19 05:41:12 CEST 2003


>>>>> "John" == John J Lee <jjl at pobox.com> writes:

    John> If it works well for you, why not stick with it?

It does work well -- today I parsed about 1000 tickers worth of pages
w/o error.  The main reason I was interested in a python solution is
if I ever decide to release my Yahoo.Finance modules.  I have done
several of yahoo's pages now (research, profile, historical) and I
figure others may find this useful someday.  I would rather send them
out as pure python than as 'requires lynx'.  This isn't too big a
deal, though, since lynx runs on a lot of platforms.

It did cause me to wonder though, whether some good python html->text
converters which render the html as text (ie, preserve visual layout),
were lurking out their beneath my radar screen.

John Hunter





More information about the Python-list mailing list