[XML-SIG] Which python DOM to use - need EntityReference
Dave Kuhlman
dkuhlman at rexx.com
Mon Jan 1 23:27:24 CET 2007
On Mon, Jan 01, 2007 at 03:09:50PM -0500, Fox, David wrote:
> I've written a Python module which uses xml.dom from PyXML 0.8.4 to generate
> XML documents according to a particular schema (VoiceXML SRGS).
>
> Now I've seen a comment on SourceForge that pyxml.sourceforge.net that PyXML
> is no longer maintained
> (http://sourceforge.net/tracker/index.php?func=detail&aid=1562266&group_id=6
> 473&atid=106473).
>
> 1. Is this true?
>
> 2. Is there a recommended DOM replacement?
>
> I've looked at xml.minidom from the standard python library, but it doesn't
> support EntityReference, which means that I can't escape apostrophes as
> ' (which is allowed but not required by XML, but is required the
> VoiceXML SRGS standard). If I include an apostrophe in a text node, it
> doesn't get escaped, whereas if I include ' it gets turned into
> "&pos;"
I've read good things about ElementTree. I use it and like it.
It is aware of and does process entities, although I do not know
enough about them well enough to know whether they're handled
correctly. I believe that it un-escapes ' on the way in
(parsing), but does not escape them when writing them out. Also,
if I set the text of a node to text containing an entity reference,
ElementTree seems to have the behavior that you do *not* want,
specifically, it escapes the ampersand. Here is a small test -- r
in the root element in the document d:
In [19]: r.text = "aaa'bbb<ccc"
In [20]: d.write(sys.stdout)
<start>aaa&apos;bbb<ccc</start>In [21]:
There is also Lxml, which implements the same API as ElementTree,
but requires installation of libxml.
But, for what it's worth, you can find out about them here:
http://effbot.org/zone/element-index.htm
http://codespeak.net/lxml/
Dave
--
Dave Kuhlman
http://www.rexx.com/~dkuhlman
More information about the XML-SIG
mailing list