xml, minidom, ElementTree

Stefan Behnel stefan_ml at behnel.de
Wed Dec 14 02:22:25 EST 2011


Terry Reedy, 14.12.2011 06:01:
> On 12/13/2011 6:21 PM, Ethan Furman wrote:
>> In the near future I will need to parse and rewrite parts of a xml files
>> created by a third-party program (PrintShopMail, for the curious).
>> It contains both binary and textual data.
>>
>> There has been some strong debate about the merits of minidom vs
>> ElementTree.
>>
>> Recommendations?
>
> People's reaction to the DOM interface seem quite varied, with a majority,
> perhaps, being negative. I personally would look at both enough to
> understand the basic API model to see where *I* fit.

The API is one thing, yes, but there's also the fact that MiniDOM doesn't 
scale. If your XML files are of a notable size (a couple of MB), MiniDOM 
may simply not be able to handle them. I collected some numbers in a blog 
post. Note that this is using a recent CPython 3.3 build which has an 
optimised Unicode string implementation, thus yielding lower memory 
requirements on average than Py2.x.

http://blog.behnel.de/index.php?p=197

The memory consumption makes a difference of a factor of 5-10 compared to 
cElementTree, which makes it two orders of magnitude larger than the size 
of the serialised file. You may be able to stuff one such file into memory, 
but you'll quickly get into trouble when you try to do parallel processing 
or otherwise use more than one document at a time.

Stefan




More information about the Python-list mailing list