Parsing broken HTML via Mozilla

Paul Wright -$P-W$- at noctua.org.uk
Tue Aug 10 22:38:00 CEST 2004


In article <mailman.1413.1092080863.5135.python-list at python.org>, Walter
Dörwald wrote:
> I'm trying to parse broken HTML with several Python tools.
> Unfortunately none of them work 100% reliable. Problems are e.g.
> nested comments, bare "&" in URLs and "<" in text (e.g. "if foo <
> bar") etc.

Not a Mozilla solution, but I hear good things about
http://www.crummy.com/software/BeautifulSoup/

-- 
Paul Wright | http://pobox.com/~pw201 | http://blog.noctua.org.uk/
Reply address is valid but discards mail with attachments: send plain text only



More information about the Python-list mailing list