[Python-Dev] Fixing the XML batteries

Stefan Behnel stefan_ml at behnel.de
Fri Dec 16 07:53:09 CET 2011

Stefan Behnel, 09.12.2011 09:02:
> I think Py3.3 would be a good milestone for cleaning up the stdlib support
> for XML.
> [...]

I still think it is, so let me sum up the current discussion here.

> What should change?
> a) The stdlib documentation should help users to choose the right tool
> right from the start.

It looks like there's agreement on this part.

> Instead of using the totally misleading wording that
> it uses now, it should be honest about the performance characteristics of
> MiniDOM and should actively suggest that those who don't know what to
> choose (or even *that* they can choose) should not use MiniDOM in the first
> place.

There was some disagreement on whether MiniDOM should publicly disclose its 
performance characteristics in the documentation, and whether its use 
should be discouraged, even just for new users.

However, it seemed that there was enough consensus to settle on Nick 
Coghlan's proposal for a compromise to move ElementTree up to the top of 
the list, and to add a visible note to the top of each of the XML modules 
like this:

"Note: The
<whatever> module is a <yada, yada, DOM based, whatever>. If all you
are trying to do is read and write XML files, consider using the
xml.etree.ElementTree module instead"

That template could (with a bit of peaking into the getopt documentation) 
be expanded into the following.

[[Note: The xml.dom.minidom module provides an implementation of the 
W3C-DOM whose API is similar to that in other programming languages. Users 
who are unfamiliar with the W3C-DOM interface or who would like to write 
less code for processing XML files should consider using the 
xml.etree.ElementTree module instead.]]

I think this should go on the xml.dom.minidom page as well as the xml.dom 
package page. Hand-wavingly, users who are new to the DOM are more likely 
to hit the package page first, whereas those who know it already will 
likely find the MiniDOM page directly.

Note that I'd still encourage the removal of the misleading word 
"lightweight" until it makes sense to put it back in a meaningful way. I 
therefore propose the following minimalistic changes to the first paragraph 
on the minidom page:

xml.dom.minidom is a [-XXX: light-weight] implementation of the Document 
Object Model interface. It is intended to be simpler than the full DOM and 
also [+XXX: provide a] significantly smaller [+XXX: API].

@Martin: note how the original paragraph does not refer to "4DOM" or 
"PyXML". It only generically mentions "the DOM interface". It is certainly 
not true that MiniDOM is more "light-weight" and "significantly smaller" 
than (most) other DOM interface implementations outside of the Python 
world, for example. So the current wording actually makes no sense at all.

Additionally, the documentation on the xml.sax page would benefit from the 
following paragraph:

[[Note: The xml.sax package provides an implementation of the SAX interface 
whose API is similar to that in other programming languages. Users who are 
unfamiliar with the SAX interface or who would like to write less code for 
efficient stream processing of XML files should consider using the 
iterparse() function in the xml.etree.ElementTree module instead.]]

If these changes are considered acceptable, I'll copy the above over to the 
documentation bug I opened at


Can these doc changes go into both 2.7 and 3.3? Given that there is no 
important difference between the implementations, I don't see why the 
documentation should differ in Py2.

> b) cElementTree should finally loose it's "special" status as a separate
> library and disappear as an accelerator module behind ElementTree.

There was no opposition and a general agreement on this in the thread, 
except for the warning that Fredrik Lundh should have a word in this. I 
wrote him an e-mail and didn't get a response so far. We can wait a little 
longer, I guess, there's still time before 3.3beta.


More information about the Python-Dev mailing list