[XML-SIG] Which python DOM to use - need EntityReference

Dave Kuhlman dkuhlman at rexx.com
Mon Jan 1 23:27:24 CET 2007


On Mon, Jan 01, 2007 at 03:09:50PM -0500, Fox, David wrote:
> I've written a Python module which uses xml.dom from PyXML 0.8.4 to generate
> XML documents according to a particular schema (VoiceXML SRGS).  
> 
> Now I've seen a comment on SourceForge that pyxml.sourceforge.net that PyXML
> is no longer maintained
> (http://sourceforge.net/tracker/index.php?func=detail&aid=1562266&group_id=6
> 473&atid=106473). 
> 
> 1.  Is this true?
> 
> 2.  Is there a recommended DOM replacement?  
> 
> I've looked at xml.minidom from the standard python library, but it doesn't
> support EntityReference, which means that I can't escape apostrophes as
> ' (which is allowed but not required by XML, but is required the
> VoiceXML SRGS standard).  If I include an apostrophe in a text node, it
> doesn't get escaped, whereas if I include ' it gets turned into
> "&pos;"

I've read good things about ElementTree.  I use it and like it.

It is aware of and does process entities, although I do not know
enough about them well enough to know whether they're handled
correctly.  I believe that it un-escapes ' on the way in
(parsing), but does not escape them when writing them out.  Also,
if I set the text of a node to text containing an entity reference,
ElementTree seems to have the behavior that you do *not* want,
specifically, it escapes the ampersand.  Here is a small test -- r
in the root element in the document d:

    In [19]: r.text = "aaa&apos;bbb<ccc"
    In [20]: d.write(sys.stdout)
    <start>aaa&amp;apos;bbb&lt;ccc</start>In [21]:


There is also Lxml, which implements the same API as ElementTree,
but requires installation of libxml.

But, for what it's worth, you can find out about them here:

    http://effbot.org/zone/element-index.htm
    http://codespeak.net/lxml/

Dave

-- 
Dave Kuhlman
http://www.rexx.com/~dkuhlman


More information about the XML-SIG mailing list