HTMLParser rejects real-life tagsoup
Gerhard Häring
gerhard.haering at gmx.de
Mon Feb 10 19:09:20 EST 2003
Rene Pijlman wrote:
> I've been using the HTMLParser module to process external web
> pages that I don't control. HTMLParser seems to be rather strict
> [...]
> Any suggestions on how to handle this? [...]
I'd try tidying up the HTML first:
http://www.lemburg.com/files/python/mxTidy.html
Gerhard
--
Favourite database: http://www.postgresql.org/
Favourite programming language: http://www.python.org/
Combine the two: http://pypgsql.sf.net/
Embedded database for Python: http://pysqlite.sf.net/
More information about the Python-list
mailing list