[XML-SIG] Corrected list of packages handling XML 1.1
Uche.Ogbuji at fourthought.com
Mon Sep 5 02:34:54 CEST 2005
On Fri, 2005-09-02 at 17:54 +0200, Ken Beesley wrote:
> Uche Ogbuji wrote
> >Message: 2
> >Date: Thu, 01 Sep 2005 11:59:09 -0600
> >From: Uche Ogbuji <Uche.Ogbuji at fourthought.com>
> >Subject: Re: [XML-SIG] Corrected list of packages handling XML 1.1
> >To: Walter D?rwald <walter at livinglogic.de>
> >Cc: xml-sig at python.org, Ken Beesley <ken.beesley at xrce.xerox.com>
> >Message-ID: <1125597549.14255.347.camel at borgia>
> >Content-Type: text/plain; charset=ISO-8859-15
> >On Thu, 2005-09-01 at 12:50 +0200, Walter D?rwald wrote:
> >>Ken Beesley wrote:
> >>>My apologies to Fredrik Lundh of Pythonware for the omission of
> >>>ElementType+sgmlop in my recent listing of Python-XML packages that
> >>>handle XML 1.1. The list (that I'm aware of) currently includes: 1.
> >>>pxdom by Andrew Clover (http://www.doxdesk.com/software/py/pxdom.html,
> >>>http://www.doxdesk.com/file/software/py/pxdom.py) 2. pyLTXML from the
> >>>Univ. of Edinburgh (http://www.ltg.ed.ac.uk/software/xml,
> >>>http://www.ltg.ed.ac.uk/software/xml/xmldoc/xmldoc.html) 3. elementtree
> >>>library from Pythonware (http://effbot.org/zone/element.htm,
> >>>http://effbot.org/zone/element-index.htm) If I've forgotten anyone,
> >>>please help me complete the list.
> >> > [...]
> >>XIST (http://www.livinglogic.de/Python/xist) handles XML 1.1 charrefs
> >>when a parser is used that does it. (XIST uses sgmlop by default, so it
> >>works by default). When serializing XML those charrefs are always
> >>supported. See the following snippet:
> >> >>> from ll.xist import parsers, presenters
> >> >>> from ll.xist.ns import html
> >> >>> e = parsers.parseString("<body>this is a backspace: </body>")
> >> >>> print e.asrepr(presenters.CodePresenter())
> >> ll.xist.ns.html.body(
> >> 'this is a backspace: \x08'
> >> )
> >> >>> print e.asBytes()
> >><body>this is a backspace: </body>
> >This conversation is really becoming surreal. People, please, it's very
> >simple: supporting the range of character references defined in XML 1.1.
> >Is not, repeat *NOT* the same thing as being an XML 1.1 parser.
> >If I have software that parses "<a>b</a>" that does not mean I have an
> >XML 1.0 parser. If that software also accepts "<a>b</c>", then it is
> >obviously not such.
> >Any software that accepts "<body>this is a backspace: </body>"
> >is neither a compliant XML 1.0 parser nor a compliant XML 1.1. parser.
> >All XML 1.1 documents *must have an XML declaration* according to the
> >strict stipulation of the spec. If an XML 1.1. parser encounters a
> >document without an XML declaration, it *must* assume that it is an XML
> >1.0 document, at which point it would *have to* stop with a fatal error
> >when it encounters . Period. There is no negotiation here.
> >Therefore, as far as I can tell, neither the ET/sgmlop trick nor XIST
> >are XML 1.1. parsers. I cannot speak for LTXML or pxdom, but knowing
> >the authors, I would guess that they are indeed compliant XML 1.1
> What Mr. Ogbuji states about "being an XML 1.1 parser" and
> "being a compliant XML 1.0 parser [or] a compliant XML 1.1
> parser" is of course correct. However, with respect, I believe
> that he misses the point and claims of the list.
> I posted a list of packages "handling XML 1.1", and Martin Dörwald
> helpfully added XIST as a package that "handles XML 1.1 charrefs
> when a parser [like sgmlop] is used that does it". Neither one of
> us claimed that all the listed packages (and especially not the ones
> using an underlying sgmlop parser) were "XML 1.1 parsers". Perhaps
> my terminology is confusing, but what I meant by "handling XML 1.1"
> is this:
> "Handle XML 1.1" = able to process a valid XML 1.1
> document without throwing up and quitting.
> Sgmlop (http://effbot.org/zone/sgmlop-index.htm) is admittedly
> non-validating and tolerant: "The *sgmlop* parser is tolerant, and
> happily accepts XML-like data that are not well-formed. If you need
> strictness, use another parser."
> In my own work, I do in fact use a second parser, separating the
> validation from the processing:
> 1. I prepare XML documents containing some control characters that are
> valid only in XML 1.1. I always mark the file <?xml version="1.1"?>
> 2. I then validate the documents using a Relax NG schema and the Jing
> validating parser, which knows the difference between XML-1.0-valid and
> 3. I then need to "handle" or "process" my
> documents, to map them non-trivially into a different XML 1.1 language.
> Despite the fact that ElementTree+sgmlop or XIST+sgmlop
> cannot be "compliant XML 1.1 parsers", their ability to "handle" an
> already-known-to-be-XML-1.1-valid document is valuable to me, and perhaps
> to others who want to work with XML 1.1 documents.
> That was the point of posting the list of "packages handling XML 1.1".
> If there's a better term than "handle XML 1.1", then please inform me,
> and I'll try to use it.
Do you really think this hair-splitting will not confuse users?
You might as well list grep, emacs, and less in your list because all of
these will "process a valid XML 1.1 document without throwing up and
Sheesh. And to think that the point of XML was to avoid such madness in
the first place.
Think I'm giving you a hard time? You should probably hope that no one
in the W3C decides to take your list too seriously.
Uche Ogbuji Fourthought, Inc.
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html
More information about the XML-SIG