stefan_ml at behnel.de
Mon Aug 24 11:54:39 CEST 2009
Dave Angel wrote:
> Stefan Behnel wrote:
>> elsa wrote:
>>> I know how to turn HTML into an ElementTree object
>> I don't. ;)
>> ElementTree doesn't have an HTML parser, so what do you use for parsing?
> Perhaps the OP was referring to XHTML, which should be eligible for
> ElementTree. But could you tell me whether ElementTree is at all
> tolerant of malformed XML? Most HTML and XHTML I encounter in the wild
> is so buggy it's amazing it all works at all.
Well, if the XHTML is "buggy", it's not XHTML at all. XHTML is XML, which
is defined as being well-formed. Any XHTML parser is required to reject
malformed input, and the expat parser that ElementTree uses is (luckily) no
Regarding malformed HTML: that's not directly supported by ElementTree,
hence my question. You can use ElementSoup to interface with BeautifulSoup,
or elementtidy to interface with tidy, or html5lib with ElementTree as
backend, or you can use lxml instead, which handles malformed HTML (and is
all fast and shiny and ... ;).
More information about the Python-list