[Expat-discuss] parsing error
Fred Drake
fdrake at acm.org
Mon Oct 29 13:19:57 CET 2007
On Oct 29, 2007, at 6:43 AM, Nikolai Koudelia wrote:
> The problem is that the material may not be correct. It may look
> like this:
...
> When expat parser reaches </brokentag>, it throws an exception and
> stops parsing. Is there a way to handle situation like that? Some
> option telling expat to skip broken closing tags? Or should I repair
> the material before parsing? Last one could be quite tricky, because
> expat could not be used for that... Any ideas?
XML parsers aren't forgiving the way HTML parsers should be, and
that's a specific goal. If you're interested in tolerating any ol'
HTML, use an HTML parser. There area a number of those available in
Python as well (htmllib, BeautifulSoup, lxml.html).
-Fred
--
Fred Drake <fdrake at acm.org>
More information about the Expat-discuss
mailing list