[XML-SIG] Canonical XML and attribute order
uche.ogbuji at fourthought.com
Thu Feb 9 23:30:55 CET 2006
Fredrik Lundh wrote:
> Dan Gunter wrote:
>> Right, in general XML processors don't care about attribute order (I
>> don't know much about canonicalization but that does sound like the
>> obvious exception).
> http://www.w3.org/TR/xml-c14n says to sort lexicographically on
> (namespace uri, local tag).
> (which, of course, is exactly what ET's default writer does)
I just want to clarify that there is a lot more to canonicalization than
that. There's surely no problem with adopting conventions from
Canonical XML, but it doesn't really make sense to treat that spec as an
authority in snippets. Either you have Canonical XML or you don't.
FYI if you do want Canonical XML, you can use PyXML's c14n module, or
you can use PyGenx to generate XML:
PyGenx is based on Genx, which always creates Canonical XML.
Side note: I have a c14n module I've put together for Amara, and it's
intended for the next release. It's based on 4Suite's fast SAX parser,
contrasting PyXML's, which is DOM-based (PyGenx is expat based, and thus
Ob c14n reference: http://www.ibm.com/developerworks/xml/library/x-c14n/
All that having been said, the OP is looking to address a common problem
among makers of XML authoring tools--the need to respect the user's
choice of attribute order and other such lexical details. It's not
really useful to repeat over and over that the XML spec states that
attribute order is not considered significant in determining the
conformance of a parser. And it's very unfair to state that the OP is
somehow fudging the grand name of "XML". Just as a fun exercise in
monkey-wrench throwing, if you read carefully enough, there's the
little-known fact that XML 1.0 doesn't require parsers to report child
elements in any particular order, either.
It's more useful to say that most XML parsers do choose to ignore
attribute order , because they are based on an abstract information
model of XML (such as the Infoset, the XPath data model or the like)
rather than the lexical form of the entities. For this reason most XML
editing tools rely on either specialized raw text frameworks, or a
hybrid of raw text with XML events (more usually the latter). This does
not mean that they are not XML processors, but just that they do choose
to preserve details that the XML spec does not *require* them to
preserve. The OP's best bet is to reuse another engine that already
gets this right, although I admit that I don't know of one available for
Python. I certainly do not write such tools, but my colleague Simon
St.Laurent did have a go at such a generic tool for Java.
Ob XML and information ordering reference:
Uche Ogbuji Fourthought, Inc.
More information about the XML-SIG