[Python-Dev] Elementtree and Namespaces in 2.5

Chris S chrisspen at gmail.com
Fri Aug 11 17:15:25 CEST 2006


I'm happy to see Elementtree being considered for inclusion with 2.5.
However, before committing to this decision, there's an issue
regarding it's namespace parsing that should be addressed. Although
Elmenttree is in most respects an excellent XML parser, a huge gotcha
that often makes Elementtree unsuitable for many applications lies in
the way it arbitrarily renames namespaces.

For example, given:

<h:html xmlns:xdc="http://www.xml.com/books"
        xmlns:h="http://www.w3.org/HTML/1998/html4">
 <h:head><h:title>Book Review</h:title></h:head>
 <h:body>
  <xdc:bookreview>
   <xdc:title>XML: A Primer</xdc:title>
   <h:table>
    <h:tr align="center">
     <h:td>Author</h:td><h:td>Price</h:td>
     <h:td>Pages</h:td><h:td>Date</h:td></h:tr>
    <h:tr align="left">
     <h:td><xdc:author>Simon St. Laurent</xdc:author></h:td>
     <h:td><xdc:price>31.98</xdc:price></h:td>
     <h:td><xdc:pages>352</xdc:pages></h:td>
     <h:td><xdc:date>1998/01</xdc:date></h:td>
    </h:tr>
   </h:table>
  </xdc:bookreview>
 </h:body>
</h:html>

Elementtree would rewrite this as:

<ns0:html xmlns:ns0="http://www.w3.org/HTML/1998/html4">
 <ns0:head><ns0:title>Book Review</ns0:title></ns0:head>
 <ns0:body>
  <ns1:bookreview xmlns:ns1="http://www.xml.com/books">
   <ns1:title>XML: A Primer</ns1:title>
   <ns0:table>
    <ns0:tr align="center">
     <ns0:td>Author</ns0:td><ns0:td>Price</ns0:td>
     <ns0:td>Pages</ns0:td><ns0:td>Date</ns0:td></ns0:tr>
    <ns0:tr align="left">
     <ns0:td><ns1:author>Simon St. Laurent</ns1:author></ns0:td>
     <ns0:td><ns1:price>31.98</ns1:price></ns0:td>
     <ns0:td><ns1:pages>352</ns1:pages></ns0:td>
     <ns0:td><ns1:date>1998/01</ns1:date></ns0:td>
    </ns0:tr>
   </ns0:table>
  </ns1:bookreview>
 </ns0:body>
</ns0:html>

There's been some discussion in comp.lang.python about this
functionality (http://groups.google.com/group/comp.lang.python/browse_thread/thread/31b2e9f4a8f7338c/363f46513fb8de04?&rnum=3&hl=en)
and while most users and the w3 spec
(http://www.w3.org/TR/2001/REC-xml-c14n-20010315#NoNSPrefixRewriting)
agree this feature is actually a bug, Fredrik Lundh has refused to fix
this problem. Of course, this is his right. Unfortunately,
Elementtree's design makes a work-around rather awkward. Therefore, we
might want to rethink inclusion of Elementtree in the stdlib, or at
least patch the stdlib's version of Elementtree to produce an output
more in line with the w3 standard.

Sincerely,
Chris Spencer


More information about the Python-Dev mailing list