[XML-SIG] Status of XML 1.1 processing in Python?
Fredrik Lundh
fredrik at pythonware.com
Wed Aug 31 00:57:51 CEST 2005
I wrote:
>> In a few sentences, could some kind soul summarize the
>> status of XML 1.1 processing using Python XML modules?
>
> I haven't done any extensive testing, but I'm quite sure that sgmlop
> 1.1 supports it.
fwiw, as the following snippet illustrates, ET+sgmlop can read files with
1.1-style character references, but the ET serializer doesn't encode such
characters on the way out. this script
from elementtree import ElementTree, SgmlopXMLTreeBuilder
from StringIO import StringIO
file = StringIO("<test>this is a backspace: </test>")
doc = ElementTree.parse(file, SgmlopXMLTreeBuilder.TreeBuilder())
root = doc.getroot()
print repr(root.text)
print repr(ElementTree.tostring(root))
prints
'this is a backspace: \x08'
'<test>this is a backspace: \x08</test>'
which isn't entirely correct.
fixing this in ElementTree is pretty straightforward; just tweak the
RE, and make sure _encode_entity is called for all cdata sections.
you can also use the following brute-force runtime patch:
# patch the ET serializer (works with 1.2.X, may break beyond that)
import re
from elementtree import ElementTree
escape = re.compile(u'[&<>\"\x01-\x09\x0b\x0c\x0e-\x1f\u0080-\uffff]+')
ElementTree._encode_entity.func_defaults = (escape,)
ElementTree._escape_cdata = lambda a, b: ElementTree._encode_entity(a)
# end
</F>
More information about the XML-SIG
mailing list