[Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike

John J Lee jjl at pobox.com
Tue Dec 2 14:37:45 EST 2003


On Tue, 2 Dec 2003, Stuart Langridge wrote:

> Simon Willison spoo'd forth:
[...]
> > Is there any way we could get a DOM tree from invalid HTML using pure
> > Python tools? The HTML tools in the Python standard library at the
[...]
> Presumably we could (the existing things, like HtmlLib or microdom do
> it);
[...]

No, they don't.  There's a whole wonderful world <wink> of invalid HTML
out there, that sgmllib and xml.dom.ext.reader.HtmlLib know nothing about.



John



More information about the Web-SIG mailing list