[ python-Bugs-1055864 ] HTMLParser not compliant to XHTML spec

Thu Oct 28 06:59:37 CEST 2004

Bugs item #1055864, was opened at 2004-10-27 21:59
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1055864&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Luke Bradley (neptune235)
Assigned to: Nobody/Anonymous (nobody)
Summary: HTMLParser not compliant to XHTML spec

Initial Comment:
HTMLParser has a problem related to the fact that is
doesn't seem to comply to the spec for XHTML. What I am
refering to can be read about here:
http://www.w3.org/TR/xhtml1/#h-4.8
In a nutshell, HTMLParser doesn't treat data inside
'script' or 'style' elements as #PCDATA, but rather
behaves like an HTML 4 parser even for XHTML documents,
parsing only end tags. As a result, entity references
in javascript are not converted as they should be.
XHTML authors writing to spec can expect entities in
script sections of XHTML documents to be converted if
the script is not explicitly escaped as a CDATA
section. which brings up problem two, That sections
explicitly escaped as CDATA are also parsed as HTML 4
'script' and 'style' sections...End tags are parsed...
My understanding is that this is bad as well:
http://www.w3.org/TR/2004/REC-xml-20040204/#dt-cdsection
because CDend is the only thing that's supposed to be
parsed in a CDATA section for all XML documents?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1055864&group_id=5470