xml, minidom, ElementTree
Stefan Behnel
stefan_ml at behnel.de
Wed Dec 14 02:22:25 EST 2011
Terry Reedy, 14.12.2011 06:01:
> On 12/13/2011 6:21 PM, Ethan Furman wrote:
>> In the near future I will need to parse and rewrite parts of a xml files
>> created by a third-party program (PrintShopMail, for the curious).
>> It contains both binary and textual data.
>>
>> There has been some strong debate about the merits of minidom vs
>> ElementTree.
>>
>> Recommendations?
>
> People's reaction to the DOM interface seem quite varied, with a majority,
> perhaps, being negative. I personally would look at both enough to
> understand the basic API model to see where *I* fit.
The API is one thing, yes, but there's also the fact that MiniDOM doesn't
scale. If your XML files are of a notable size (a couple of MB), MiniDOM
may simply not be able to handle them. I collected some numbers in a blog
post. Note that this is using a recent CPython 3.3 build which has an
optimised Unicode string implementation, thus yielding lower memory
requirements on average than Py2.x.
http://blog.behnel.de/index.php?p=197
The memory consumption makes a difference of a factor of 5-10 compared to
cElementTree, which makes it two orders of magnitude larger than the size
of the serialised file. You may be able to stuff one such file into memory,
but you'll quickly get into trouble when you try to do parallel processing
or otherwise use more than one document at a time.
Stefan
More information about the Python-list
mailing list