[XML-SIG] How to get SAX to parse not well formed HTML doc?

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 18 Jul 2001 01:02:49 +0200


> > I need to parse a bunch of HTML documents, yet the parser is too
> > strict for this task. It stops at places where considered correct by
> > HTML rules, like unquoted attributes. Can I make the parser more
> > relaxed toward HTML documents?
> 
> You might have more luck using the HTML parser, rather than SAX, which is
> deigned for parsing XML.
> 
> The HTML parser is in htmllib and works in much the same way, and it handles
> unquoted attributes without any problems.

Alternatively, you can use xml.parsers.sgmlop in the SGML mode.

Regards,
Martin