trying to parse non valid html documents with HTMLParser

florent florent.newsgroups at kynesthesy.org
Wed Aug 3 11:44:17 CEST 2005


> AFAIK not with HTMLParser or htmllib. You might try (if you haven't done
> yet) htmllib and see, which parser is more forgiving.

Thanks, I'll try htmllib.
In other case, I found a solution. Feeding data to the HTMLParser by 
chunks extracted from the string using string.split("<"), will allow me 
to loose only one tag at a time when an exception is raised !



More information about the Python-list mailing list