[Python-Dev] Fixing the XML batteries

Xavier Morel python-dev at masklinn.net
Fri Dec 9 10:09:39 CET 2011


On 2011-12-09, at 09:41 , Martin v. Löwis wrote:
>> a) The stdlib documentation should help users to choose the right tool
>> right from the start. Instead of using the totally misleading wording
>> that it uses now, it should be honest about the performance
>> characteristics of MiniDOM and should actively suggest that those who
>> don't know what to choose (or even *that* they can choose) should not
>> use MiniDOM in the first place.
> 
> I disagree. The right approach is not to document performance problems,
> but to fix them.
Even if performance problems "should not be documented", I think Stefan's point that users should be steered away from minidom and towards ET and cET is completely valid and worthy of support: the *only* advantage minidom has over ET is that it uses an interface familiar to Java users[0] (they are about the only people using actual W3C DOM, while the DOM exists in javascript I'd say most code out there actively tries to not touch it with anything less than a 10-foot library pole like jQuery). That interface is also, of course, absolutely dreadful.

Minidom is inferior in interface flow and pythonicity, in terseness, in speed, in memory consumption (even more so using cElementTree, and that's not something which can be fixed unless minidom gets a C accelerator), etc… Even after fixing minidom (if anybody has the time and drive to commit to it), ET/cET should be preferred over it.

And that's not even considering the ease of switching to lxml (if only for validators), which Stefan outlined.

[0] not 100% true now that I think about it: handling mixed content is simpler in minidom as there is no .text/.tail duality and text nodes are nodes like every other, but I really can't think of an other reason to prefer minidom


More information about the Python-Dev mailing list