[XML-SIG] Corrected list of packages handling XML 1.1

Thu Sep 1 19:59:09 CEST 2005

On Thu, 2005-09-01 at 12:50 +0200, Walter Dörwald wrote:
> Ken Beesley wrote:
> 
> > My apologies to Fredrik Lundh of Pythonware for the omission of 
> > ElementType+sgmlop in my recent listing of Python-XML packages that 
> > handle XML 1.1. The list (that I'm aware of) currently includes: 1. 
> > pxdom by Andrew Clover (http://www.doxdesk.com/software/py/pxdom.html, 
> > http://www.doxdesk.com/file/software/py/pxdom.py) 2. pyLTXML from the 
> > Univ. of Edinburgh (http://www.ltg.ed.ac.uk/software/xml, 
> > http://www.ltg.ed.ac.uk/software/gpl_xml.html, 
> > http://www.ltg.ed.ac.uk/software/xml/xmldoc/xmldoc.html) 3. elementtree 
> > library from Pythonware (http://effbot.org/zone/element.htm, 
> > http://effbot.org/zone/element-index.htm) If I've forgotten anyone, 
> > please help me complete the list.
>  > [...]
> 
> XIST (http://www.livinglogic.de/Python/xist) handles XML 1.1 charrefs 
> when a parser is used that does it. (XIST uses sgmlop by default, so it 
> works by default). When serializing XML those charrefs are always 
> supported. See the following snippet:
> 
>  >>> from ll.xist import parsers, presenters
>  >>> from ll.xist.ns import html
>  >>> e = parsers.parseString("<body>this is a backspace: &#x0008;</body>")
>  >>> print e.asrepr(presenters.CodePresenter())
> ll.xist.xsc.Frag(
>     ll.xist.ns.html.body(
>        'this is a backspace: \x08'
>     )
> )
>  >>> print e.asBytes()
> <body>this is a backspace: &#8;</body>

This conversation is really becoming surreal.  People, please, it's very
simple: supporting the range of character references defined in XML 1.1.
Is not, repeat *NOT* the same thing as being an XML 1.1 parser.

If I have software that parses "<a>b</a>" that does not mean I have an
XML 1.0 parser.  If that software also accepts "<a>b</c>", then it is
obviously not such.

Any software that accepts "<body>this is a backspace: &#x0008;</body>"
is neither a compliant XML 1.0 parser nor a compliant XML 1.1. parser.
All XML 1.1 documents *must have an XML declaration* according to the
strict stipulation of the spec.  If an XML 1.1. parser encounters a
document without an XML declaration, it *must* assume that it is an XML
1.0 document, at which point it would *have to* stop with a fatal error
when it encounters &#x0008;.  Period.  There is no negotiation here.

Therefore, as far as I can tell, neither the ET/sgmlop trick nor XIST
are XML 1.1. parsers.  I cannot speak for LTXML or pxdom, but knowing
the authors, I would guess that they are indeed compliant XML 1.1
parsers.

-- 
Uche Ogbuji                               Fourthought, Inc.
http://uche.ogbuji.net                    http://fourthought.com
http://copia.ogbuji.net                   http://4Suite.org
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html