Parsing unicode (devanagari) text with xml.dom.minidom

Stefan Behnel stefan_ml at behnel.de
Sun Mar 8 05:38:40 EDT 2009


Martin v. Löwis wrote:
>> Regarding minidom, you might be happier with the xml.etree package that
>> comes with Python2.5 and later (it's also avalable for older versions).
>> It's a lot easier to use, more memory friendly and also much faster.
> 
> OTOH, choice of XML library is completely irrelevant for the issue at
> hand.

For the described problem, maybe. But certainly not for the application.
The background was parsing the XML dump of an entire web site, which I
would expect to be larger than what minidom is designed to handle
gracefully. Switching to cElementTree before major code gets written is
almost certainly a good idea here.

Stefan



More information about the Python-list mailing list