[Python-Dev] Fixing the XML batteries
Stefan Behnel
stefan_ml at behnel.de
Fri Dec 16 07:53:09 CET 2011
Stefan Behnel, 09.12.2011 09:02:
> I think Py3.3 would be a good milestone for cleaning up the stdlib support
> for XML.
> [...]
I still think it is, so let me sum up the current discussion here.
> What should change?
>
> a) The stdlib documentation should help users to choose the right tool
> right from the start.
It looks like there's agreement on this part.
> Instead of using the totally misleading wording that
> it uses now, it should be honest about the performance characteristics of
> MiniDOM and should actively suggest that those who don't know what to
> choose (or even *that* they can choose) should not use MiniDOM in the first
> place.
There was some disagreement on whether MiniDOM should publicly disclose its
performance characteristics in the documentation, and whether its use
should be discouraged, even just for new users.
However, it seemed that there was enough consensus to settle on Nick
Coghlan's proposal for a compromise to move ElementTree up to the top of
the list, and to add a visible note to the top of each of the XML modules
like this:
"Note: The
<whatever> module is a <yada, yada, DOM based, whatever>. If all you
are trying to do is read and write XML files, consider using the
xml.etree.ElementTree module instead"
That template could (with a bit of peaking into the getopt documentation)
be expanded into the following.
"""
[[Note: The xml.dom.minidom module provides an implementation of the
W3C-DOM whose API is similar to that in other programming languages. Users
who are unfamiliar with the W3C-DOM interface or who would like to write
less code for processing XML files should consider using the
xml.etree.ElementTree module instead.]]
"""
I think this should go on the xml.dom.minidom page as well as the xml.dom
package page. Hand-wavingly, users who are new to the DOM are more likely
to hit the package page first, whereas those who know it already will
likely find the MiniDOM page directly.
Note that I'd still encourage the removal of the misleading word
"lightweight" until it makes sense to put it back in a meaningful way. I
therefore propose the following minimalistic changes to the first paragraph
on the minidom page:
"""
xml.dom.minidom is a [-XXX: light-weight] implementation of the Document
Object Model interface. It is intended to be simpler than the full DOM and
also [+XXX: provide a] significantly smaller [+XXX: API].
"""
@Martin: note how the original paragraph does not refer to "4DOM" or
"PyXML". It only generically mentions "the DOM interface". It is certainly
not true that MiniDOM is more "light-weight" and "significantly smaller"
than (most) other DOM interface implementations outside of the Python
world, for example. So the current wording actually makes no sense at all.
Additionally, the documentation on the xml.sax page would benefit from the
following paragraph:
"""
[[Note: The xml.sax package provides an implementation of the SAX interface
whose API is similar to that in other programming languages. Users who are
unfamiliar with the SAX interface or who would like to write less code for
efficient stream processing of XML files should consider using the
iterparse() function in the xml.etree.ElementTree module instead.]]
"""
If these changes are considered acceptable, I'll copy the above over to the
documentation bug I opened at
http://bugs.python.org/issue11379
Can these doc changes go into both 2.7 and 3.3? Given that there is no
important difference between the implementations, I don't see why the
documentation should differ in Py2.
> b) cElementTree should finally loose it's "special" status as a separate
> library and disappear as an accelerator module behind ElementTree.
There was no opposition and a general agreement on this in the thread,
except for the warning that Fredrik Lundh should have a word in this. I
wrote him an e-mail and didn't get a response so far. We can wait a little
longer, I guess, there's still time before 3.3beta.
Stefan
More information about the Python-Dev
mailing list