[XML-SIG] need help: Sax can't read w3 dtds?
and-xml at doxdesk.com
Fri Aug 6 10:06:05 CEST 2004
Luke Bradley <webworldl at yahoo.com> wrote:
> My problem is that when I try to parse XHTML1.1
> documents with pythons SAX implementation, it throws
> an error claiming that there are errors in the W3C's
It's right - there are. Many other parsers won't accept them either. The
(first) error is at line 37 char 20 of
<!ENTITY lt "&<" ><!-- less-than sign, U+003C ISOnum -->
Since character references are decoded once at entity-definition time
this actual defines the entity lt as containing '&<', which is grossly
ill-formed as well as being incompatible with <'s canonical content.
Exactly how much of an error this is in XML is a arguable point, given
that this entity is not actually used after its declaration. However
parsers that need to report the declared entity content independently of
their references (such as DOM implementations) cannot possibly allow it.
This is a bug in XHTML Modularization that makes handling today's XHTML
1.1 with validation a bit of a non-starter (along with all the other
problems connected with XHTML 1.1). Unfortunately W3C process has
prevented the error from being fixed before the forthcoming XHTML
Modularization Second Edition.
If you need to handle XHTML 1.1 at the moment, do it without
mailto:and at doxdesk.com
More information about the XML-SIG