trying to parse non valid html documents with HTMLParser
florent.newsgroups at kynesthesy.org
Wed Aug 3 11:44:17 CEST 2005
> AFAIK not with HTMLParser or htmllib. You might try (if you haven't done
> yet) htmllib and see, which parser is more forgiving.
Thanks, I'll try htmllib.
In other case, I found a solution. Feeding data to the HTMLParser by
chunks extracted from the string using string.split("<"), will allow me
to loose only one tag at a time when an exception is raised !
More information about the Python-list