[XML-SIG] Status of XML 1.1 processing in Python?
Uche.Ogbuji at fourthought.com
Mon Aug 29 19:58:46 CEST 2005
On Wed, 2005-08-24 at 13:27 +0200, Ken Beesley wrote:
> Many thanks to Fredrik Lundh, Fred Drake and Daniel Veillard
> for information on the status of XML 1.1 processing in Python.
> I'll do my best to do some testing and report back.
> Why I need XML 1.1 characters
> In case anyone is interested, my goal is to facilitate
> the definition of new Unicode input methods for Mac OS X.
> Apple already supplies a very human-UNfriendly XML
> language for defining new input methods. I have defined a
> new human-friendly XML language and need to convert my
> human-friendly XML files automatically to Apple's human-
> UNfriendly XML.
> The basic idea of input methods is that they
> intercept incoming key events, or sequences of key events, and
> map them into Unicode-character outputs that are sent to
> the destination,
> e.g. to the buffer of a Unicode text editor. Some of these
> Unicode output characters are control characters that are
> invalid in XML 1.0 but valid in XML 1.1. (I.e. when you
> press appropriate "control" keys on your keyboard, the output
> to the application is naturally a "control character".)
> If you define a new OS X input method in Apple's current
> XML format, the XML file contains control characters that
> are valid only in XML 1.1. The underlying (mystery) Apple
> parser that processes
> that XML file does _not_ choke on the control characters,
> so this processor is assuming the XML 1.1 character set,
> even if the XML file is overtly marked version="1.0". That's
> a no-no, of course; if the file is marked version="1.0", then
> any kosher XML processor should refuse to parse/process
> the file if it contains control characters not valid in XML 1.0.
> My human-friendly XML language is defined in Relax NG,
> and when I specify version="1.1", the files validate as they
> should using Jing. (If I change the attribute to version="1.0", then
> Jing properly refuses to validate the files because of the invalid
> control characters.) So far so good.
> But then when I try to write a Python script to parse
> the human-friendly XML language and convert it (very non-trivially)
> to the human-unfriendly XML language defined by Apple,
> the Python script (if limited to XML 1.0 processing) chokes
> as soon as it sees the offending control characters. Sigh.
> Hence my need for a Python XML parsing/processing module
> that handles XML 1.1 characters when the file is
> appropriate marked version="1.1".
Interesting. nd thanks for taking a time to state your case so clearly.
XML 1.1 does allow more control chars than 1.0, but some are still
banned, so I don't think you have a comprehensive solution here.
My suggested solution would be something mike Brown and I have often
discussed: mapping the illegal characters to the Unicode private use
area (PUA), and then back, as needed. Python should make this an easy
solution. You can also use special elements or processing instructions
to encode the problem characters.
I do not suggest relying on XML 1.1 in the way you propose because
uptake for it is slow not only in the Expat world. It's a pretty
controversial spec (like every second-gen spec the W3C produces, it
Even if libxml folks and the Expat folks actually commit to XML 1.1
support, it will probably be a little while in coming, so I suggest a
more general workaround, such as I've suggested within the bounds of XML
Uche Ogbuji Fourthought, Inc.
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html
More information about the XML-SIG