[XML-SIG] need help: Sax can't read w3 dtds?

Andrew Clover and-xml at doxdesk.com
Fri Aug 6 10:06:05 CEST 2004


Luke Bradley <webworldl at yahoo.com> wrote:

> My problem is that when I try to parse XHTML1.1
> documents with pythons SAX implementation, it throws
> an error claiming that there are errors in the W3C's
> DTD's.

It's right - there are. Many other parsers won't accept them either. The 
(first) error is at line 37 char 20 of 
http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-special.ent:

   <!ENTITY lt "&#38;&#60;" ><!-- less-than sign, U+003C ISOnum -->

Since character references are decoded once at entity-definition time 
this actual defines the entity lt as containing '&<', which is grossly 
ill-formed as well as being incompatible with &lt;'s canonical content.

Exactly how much of an error this is in XML is a arguable point, given 
that this entity is not actually used after its declaration. However 
parsers that need to report the declared entity content independently of 
their references (such as DOM implementations) cannot possibly allow it.

This is a bug in XHTML Modularization that makes handling today's XHTML 
1.1 with validation a bit of a non-starter (along with all the other 
problems connected with XHTML 1.1). Unfortunately W3C process has 
prevented the error from being fixed before the forthcoming XHTML 
Modularization Second Edition.

If you need to handle XHTML 1.1 at the moment, do it without 
validation/external entities.

-- 
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/


More information about the XML-SIG mailing list