[Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike
Stuart Langridge
aquarius-lists at kryogenix.org
Wed Dec 3 05:01:34 EST 2003
John J Lee spoo'd forth:
> On Tue, 2 Dec 2003, Stuart Langridge wrote:
>> Simon Willison spoo'd forth:
>> > Is there any way we could get a DOM tree from invalid HTML using pure
>> > Python tools? The HTML tools in the Python standard library at the
>> Presumably we could (the existing things, like HtmlLib or microdom do
>> it);
>
> No, they don't. There's a whole wonderful world <wink> of invalid HTML
> out there, that sgmllib and xml.dom.ext.reader.HtmlLib know nothing about.
Really? What sort of thing do they fail to parse?
sil
--
If hard data were the filtering criterion you could fit the entire
contents of the Internet on a floppy disk.
-- Cecil Adams
More information about the Web-SIG
mailing list