HTML DOM parser?

adfgvx adfgvx at free.fr
Thu Jul 31 16:23:53 EDT 2003


Try tidy. There are two python wrappers : mxtidy and utidy, the latest is
more recent and use the new tidylib. BUT it will only correct a bad html
page and transform it to an xml or xhtml output that you load after as a DOM
with another parser. Personnaly I use pyRXP.

Bruno Lienard


"Paul Rubin" <http://phr.cx@NOSPAM.invalid> a écrit dans le message de news:
7x7k5y5wfh.fsf_-_ at ruckus.brouhaha.com...
> Is there an HTML DOM parser available for Python?  Preferably one that
> does a reasonable job with the crappy HTML out there on real web
> pages, that doesn't get upset about unterminated tables and stuff like
> that.  Many extra points if it understands Javascript.  Application is
> a screen scraping web robot.  Thanks.






More information about the Python-list mailing list