HTMLParser rejects real-life tagsoup

Gerhard Häring gerhard.haering at gmx.de
Mon Feb 10 19:09:20 EST 2003


Rene Pijlman wrote:
> I've been using the HTMLParser module to process external web
> pages that I don't control. HTMLParser seems to be rather strict
> [...]
> Any suggestions on how to handle this? [...]

I'd try tidying up the HTML first:
http://www.lemburg.com/files/python/mxTidy.html

Gerhard
-- 
Favourite database:             http://www.postgresql.org/
Favourite programming language: http://www.python.org/
Combine the two:                http://pypgsql.sf.net/
Embedded database for Python:   http://pysqlite.sf.net/




More information about the Python-list mailing list