[Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike
aquarius-lists at kryogenix.org
Wed Dec 3 05:01:34 EST 2003
John J Lee spoo'd forth:
> On Tue, 2 Dec 2003, Stuart Langridge wrote:
>> Simon Willison spoo'd forth:
>> > Is there any way we could get a DOM tree from invalid HTML using pure
>> > Python tools? The HTML tools in the Python standard library at the
>> Presumably we could (the existing things, like HtmlLib or microdom do
> No, they don't. There's a whole wonderful world <wink> of invalid HTML
> out there, that sgmllib and xml.dom.ext.reader.HtmlLib know nothing about.
Really? What sort of thing do they fail to parse?
If hard data were the filtering criterion you could fit the entire
contents of the Internet on a floppy disk.
-- Cecil Adams
More information about the Web-SIG