Parsing complex web pages safely with htmllib.HTMLParser
akuchlin at ute.mems-exchange.org
Thu Jan 24 16:39:28 CET 2002
In article <mailman.1011873335.21639.python-list at python.org>,
montanaro at tttech.com wrote:
> I'm not sure how XHTML will solve the problem. Instead of broken HTML we'll
> have to contend with broken XHTML. Browser manufacturers will still attempt
> to do something reasonable with syntactically incorrect pages, thus making
> it unlikely that people will fix them...
I don't think so. Mozilla doesn't accept invalid XHTML, and neither
does IE. For example, when I point either Mozilla or IE 6 at
http://www.w3schools.com/xml/note_error.xml, I get this page:
XML Parsing Error: mismatched tag. Expected: </to>.
Line Number 3, Column 13: <to>Tove</To>
Surprisingly Opera is the least clear; it reports 'Transmission Stopped',
with no indication that it's actually an XML and not a network problem.
Every time you sound confident nowadays, something terrible seems to
-- Peri, in "Vengeance on Varos"
More information about the Python-list