[XML-SIG] Corrected list of packages handling XML 1.1
ken.beesley at xrce.xerox.com
Thu Sep 1 11:47:04 CEST 2005
My apologies to Fredrik Lundh of Pythonware for the omission of
ElementType+sgmlop in my recent listing of Python-XML packages that
handle XML 1.1. The list (that I'm aware of) currently includes: 1.
pxdom by Andrew Clover (http://www.doxdesk.com/software/py/pxdom.html,
http://www.doxdesk.com/file/software/py/pxdom.py) 2. pyLTXML from the
Univ. of Edinburgh (http://www.ltg.ed.ac.uk/software/xml,
http://www.ltg.ed.ac.uk/software/xml/xmldoc/xmldoc.html) 3. elementtree
library from Pythonware (http://effbot.org/zone/element.htm,
http://effbot.org/zone/element-index.htm) If I've forgotten anyone,
please help me complete the list. I'm still a Python-XML beginner, and
any omissions are unintentional. Thanks again to all those who have
provided such tools. Ken Fredrik Lundh <fredrik at pythonware.com> wrote
fwiw, as the following snippet illustrates, ET+sgmlop can read files with
1.1-style character references, but the ET serializer doesn't encode such
characters on the way out. this script
from elementtree import ElementTree, SgmlopXMLTreeBuilder
from StringIO import StringIO
file = StringIO("<test>this is a backspace: </test>")
doc = ElementTree.parse(file, SgmlopXMLTreeBuilder.TreeBuilder())
root = doc.getroot()
'this is a backspace: \x08'
'<test>this is a backspace: \x08</test>'
which isn't entirely correct.
fixing this in ElementTree is pretty straightforward; just tweak the
RE, and make sure _encode_entity is called for all cdata sections.
you can also use the following brute-force runtime patch:
# patch the ET serializer (works with 1.2.X, may break beyond that)
from elementtree import ElementTree
escape = re.compile(u'[&<>\"\x01-\x09\x0b\x0c\x0e-\x1f\u0080-\uffff]+')
ElementTree._encode_entity.func_defaults = (escape,)
ElementTree._escape_cdata = lambda a, b: ElementTree._encode_entity(a)
More information about the XML-SIG