[Expat-discuss] parsing error

Fred Drake fdrake at acm.org
Mon Oct 29 13:19:57 CET 2007


On Oct 29, 2007, at 6:43 AM, Nikolai Koudelia wrote:
> The problem is that the material may not be correct. It may look  
> like this:
...
> When expat parser reaches </brokentag>, it throws an exception and
> stops parsing. Is there a way to handle situation like that? Some
> option telling expat to skip broken closing tags? Or should I repair
> the material before parsing? Last one could be quite tricky, because
> expat could not be used for that... Any ideas?

XML parsers aren't forgiving the way HTML parsers should be, and  
that's a specific goal.  If you're interested in tolerating any ol'  
HTML, use an HTML parser.  There area a number of those available in  
Python as well (htmllib, BeautifulSoup, lxml.html).


   -Fred

-- 
Fred Drake   <fdrake at acm.org>





More information about the Expat-discuss mailing list