[XML-SIG] Corrected list of packages handling XML 1.1

Ken Beesley ken.beesley at xrce.xerox.com
Thu Sep 1 11:47:04 CEST 2005

My apologies to Fredrik Lundh of Pythonware for the omission of 
ElementType+sgmlop in my recent listing of Python-XML packages that 
handle XML 1.1. The list (that I'm aware of) currently includes: 1. 
pxdom by Andrew Clover (http://www.doxdesk.com/software/py/pxdom.html, 
http://www.doxdesk.com/file/software/py/pxdom.py) 2. pyLTXML from the 
Univ. of Edinburgh (http://www.ltg.ed.ac.uk/software/xml, 
http://www.ltg.ed.ac.uk/software/xml/xmldoc/xmldoc.html) 3. elementtree 
library from Pythonware (http://effbot.org/zone/element.htm, 
http://effbot.org/zone/element-index.htm) If I've forgotten anyone, 
please help me complete the list. I'm still a Python-XML beginner, and 
any omissions are unintentional. Thanks again to all those who have 
provided such tools. Ken Fredrik Lundh <fredrik at pythonware.com> wrote

fwiw, as the following snippet illustrates, ET+sgmlop can read files with
1.1-style character references, but the ET serializer doesn't encode such
characters on the way out. this script

from elementtree import ElementTree, SgmlopXMLTreeBuilder
from StringIO import StringIO

file = StringIO("<test>this is a backspace: &#x0008;</test>")

doc = ElementTree.parse(file, SgmlopXMLTreeBuilder.TreeBuilder())

root = doc.getroot()

print repr(root.text)
print repr(ElementTree.tostring(root))


'this is a backspace: \x08'
'<test>this is a backspace: \x08</test>'

which isn't entirely correct.

fixing this in ElementTree is pretty straightforward; just tweak the
RE, and make sure _encode_entity is called for all cdata sections.

you can also use the following brute-force runtime patch:

# patch the ET serializer (works with 1.2.X, may break beyond that)
import re
from elementtree import ElementTree
escape = re.compile(u'[&<>\"\x01-\x09\x0b\x0c\x0e-\x1f\u0080-\uffff]+')
ElementTree._encode_entity.func_defaults = (escape,)
ElementTree._escape_cdata = lambda a, b: ElementTree._encode_entity(a)
# end


More information about the XML-SIG mailing list