Parsing complex web pages safely with htmllib.HTMLParser

Thu Jan 24 10:39:28 EST 2002

In article <mailman.1011873335.21639.python-list at python.org>, 
	montanaro at tttech.com wrote:
> I'm not sure how XHTML will solve the problem.  Instead of broken HTML we'll
> have to contend with broken XHTML.  Browser manufacturers will still attempt
> to do something reasonable with syntactically incorrect pages, thus making
> it unlikely that people will fix them...

I don't think so.  Mozilla doesn't accept invalid XHTML, and neither
does IE.  For example, when I point either Mozilla or IE 6 at
http://www.w3schools.com/xml/note_error.xml, I get this page:

	XML Parsing Error: mismatched tag. Expected: </to>.
	Location: http://www.w3schools.com/xml/note_error.xml
	Line Number 3, Column 13:  <to>Tove</To>
	------------^

Surprisingly Opera is the least clear; it reports 'Transmission Stopped',
with no indication that it's actually an XML and not a network problem.

--amk                                                  (www.amk.ca)
Every time you sound confident nowadays, something terrible seems to
happen.
    -- Peri, in "Vengeance on Varos"