[XML-SIG] Corrected list of packages handling XML 1.1

Mon Sep 5 02:34:54 CEST 2005

On Fri, 2005-09-02 at 17:54 +0200, Ken Beesley wrote:
> Uche Ogbuji wrote
> 
> >------------------------------
> >
> >Message: 2
> >Date: Thu, 01 Sep 2005 11:59:09 -0600
> >From: Uche Ogbuji <Uche.Ogbuji at fourthought.com>
> >Subject: Re: [XML-SIG] Corrected list of packages handling XML 1.1
> >To: Walter D?rwald <walter at livinglogic.de>
> >Cc: xml-sig at python.org, Ken Beesley <ken.beesley at xrce.xerox.com>
> >Message-ID: <1125597549.14255.347.camel at borgia>
> >Content-Type: text/plain; charset=ISO-8859-15
> >
> >On Thu, 2005-09-01 at 12:50 +0200, Walter D?rwald wrote:
> >  
> >
> >>Ken Beesley wrote:
> >>
> >>    
> >>
> >>>My apologies to Fredrik Lundh of Pythonware for the omission of 
> >>>ElementType+sgmlop in my recent listing of Python-XML packages that 
> >>>handle XML 1.1. The list (that I'm aware of) currently includes: 1. 
> >>>pxdom by Andrew Clover (http://www.doxdesk.com/software/py/pxdom.html, 
> >>>http://www.doxdesk.com/file/software/py/pxdom.py) 2. pyLTXML from the 
> >>>Univ. of Edinburgh (http://www.ltg.ed.ac.uk/software/xml, 
> >>>http://www.ltg.ed.ac.uk/software/gpl_xml.html, 
> >>>http://www.ltg.ed.ac.uk/software/xml/xmldoc/xmldoc.html) 3. elementtree 
> >>>library from Pythonware (http://effbot.org/zone/element.htm, 
> >>>http://effbot.org/zone/element-index.htm) If I've forgotten anyone, 
> >>>please help me complete the list.
> >>>      
> >>>
> >> > [...]
> >>
> >>XIST (http://www.livinglogic.de/Python/xist) handles XML 1.1 charrefs 
> >>when a parser is used that does it. (XIST uses sgmlop by default, so it 
> >>works by default). When serializing XML those charrefs are always 
> >>supported. See the following snippet:
> >>
> >> >>> from ll.xist import parsers, presenters
> >> >>> from ll.xist.ns import html
> >> >>> e = parsers.parseString("<body>this is a backspace: &#x0008;</body>")
> >> >>> print e.asrepr(presenters.CodePresenter())
> >>ll.xist.xsc.Frag(
> >>    ll.xist.ns.html.body(
> >>       'this is a backspace: \x08'
> >>    )
> >>)
> >> >>> print e.asBytes()
> >><body>this is a backspace: &#8;</body>
> >>    
> >>
> >
> >This conversation is really becoming surreal.  People, please, it's very
> >simple: supporting the range of character references defined in XML 1.1.
> >Is not, repeat *NOT* the same thing as being an XML 1.1 parser.
> >
> >If I have software that parses "<a>b</a>" that does not mean I have an
> >XML 1.0 parser.  If that software also accepts "<a>b</c>", then it is
> >obviously not such.
> >
> >Any software that accepts "<body>this is a backspace: &#x0008;</body>"
> >is neither a compliant XML 1.0 parser nor a compliant XML 1.1. parser.
> >All XML 1.1 documents *must have an XML declaration* according to the
> >strict stipulation of the spec.  If an XML 1.1. parser encounters a
> >document without an XML declaration, it *must* assume that it is an XML
> >1.0 document, at which point it would *have to* stop with a fatal error
> >when it encounters &#x0008;.  Period.  There is no negotiation here.
> >
> >Therefore, as far as I can tell, neither the ET/sgmlop trick nor XIST
> >are XML 1.1. parsers.  I cannot speak for LTXML or pxdom, but knowing
> >the authors, I would guess that they are indeed compliant XML 1.1
> >parsers.
> >
> >
> >  
> >
> What Mr. Ogbuji states about "being an XML 1.1 parser" and
> "being a compliant XML 1.0 parser [or] a compliant XML  1.1
> parser" is of course correct.  However, with respect, I believe
> that he misses the point and claims of the list.
> 
> I posted a list of packages "handling XML 1.1", and Martin Dörwald
> helpfully added XIST as a package that "handles XML 1.1 charrefs
> when a parser [like sgmlop] is used that does it".   Neither one of
> us claimed that all the listed packages (and especially not the ones
> using an underlying sgmlop parser) were "XML 1.1 parsers".  Perhaps
> my terminology is confusing, but what I meant by "handling XML 1.1"
> is this:
> 
>          "Handle XML 1.1" = able to process a valid XML 1.1
>                document without throwing up and quitting.
> 
> Sgmlop (http://effbot.org/zone/sgmlop-index.htm) is admittedly
> non-validating and tolerant:  "The *sgmlop* parser is tolerant, and
> happily accepts XML-like data that are not well-formed. If you need
> strictness, use another parser."
> 
> In my own work, I do in fact use a second parser, separating the
> validation from the processing:
> 
> 1.  I prepare XML documents containing some control characters that are
> valid only in XML 1.1.  I always mark the file <?xml version="1.1"?>
> 
> 2.  I then validate the documents using a Relax NG schema and the Jing
> validating parser, which knows the difference between XML-1.0-valid and
> XML-1.1-valid. 
> 
> 3.  I then need to "handle" or "process" my 
> already-known-to-be-XML-1.1-valid
> documents, to map them non-trivially into a different XML 1.1 language. 
> Despite the fact that ElementTree+sgmlop or XIST+sgmlop
> cannot be "compliant XML 1.1 parsers", their ability to "handle" an
> already-known-to-be-XML-1.1-valid document is valuable to me, and perhaps
> to others who want to work with XML 1.1 documents.
> 
> ******
> That was the point of posting the list of "packages handling XML 1.1".
> If there's a better term than "handle XML 1.1", then please inform me,
> and I'll try to use it.

Do you really think this hair-splitting will not confuse users?

You might as well list grep, emacs, and less in your list because all of
these will "process a valid XML 1.1 document without throwing up and
quitting".

Sheesh.  And to think that the point of XML was to avoid such madness in
the first place.

Think I'm giving you a hard time?  You should probably hope that no one
in the W3C decides to take your list too seriously.

-- 
Uche Ogbuji                               Fourthought, Inc.
http://uche.ogbuji.net                    http://fourthought.com
http://copia.ogbuji.net                   http://4Suite.org
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html