Why is xml.dom.minidom so slow?
fredrik at pythonware.com
Thu Jan 2 22:50:13 CET 2003
Bjorn Pettersen wrote:
> All I'm doing boils down to:
> response = rf.nextResponse()
> dom = parseString(response)
> in a loop. Am I doing something wrong? Is there a faster way when all I
> need is a traversable tree structure as the result?
as a general rule, XML toolkits that try to implement the DOM specification
in pure Python are incredibly slow and bloated.
on random XML data, minidom can easily gobble up a kilobyte or two for
each element. in one of my benchmarks, it used about 50 bytes of object
memory for each input character:
creating all those objects take time...
toolkits that use a more pythonic api also tend to be more efficient; for
example, the pure python version of my elementtree module is typically
3-5 times faster than minidom, and uses less than half the memory:
you may be able to reach 10x with SAX-style custom code using pyexpat
(or sgmlop) directly...
...but to be on the safe side, I'd go for a C parser/tree builder. the following
two are about as fast as anything can be:
(unfortunately, the C version of elementtree isn't yet ready for public
More information about the Python-list