[Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike

Stuart Langridge aquarius-lists at kryogenix.org
Wed Dec 3 05:01:34 EST 2003


John J Lee spoo'd forth:
> On Tue, 2 Dec 2003, Stuart Langridge wrote:
>> Simon Willison spoo'd forth:
>> > Is there any way we could get a DOM tree from invalid HTML using pure
>> > Python tools? The HTML tools in the Python standard library at the
>> Presumably we could (the existing things, like HtmlLib or microdom do
>> it);
> 
> No, they don't.  There's a whole wonderful world <wink> of invalid HTML
> out there, that sgmllib and xml.dom.ext.reader.HtmlLib know nothing about.

Really? What sort of thing do they fail to parse?

sil

-- 
If hard data were the filtering criterion you could fit the entire
contents of the Internet on a floppy disk.
	   -- Cecil Adams



More information about the Web-SIG mailing list