Why is xml.dom.minidom so slow?
Fredrik Lundh
fredrik at pythonware.com
Thu Jan 2 16:50:13 EST 2003
Bjorn Pettersen wrote:
> All I'm doing boils down to:
>
> response = rf.nextResponse()
> dom = parseString(response)
>
> in a loop. Am I doing something wrong? Is there a faster way when all I
> need is a traversable tree structure as the result?
as a general rule, XML toolkits that try to implement the DOM specification
in pure Python are incredibly slow and bloated.
on random XML data, minidom can easily gobble up a kilobyte or two for
each element. in one of my benchmarks, it used about 50 bytes of object
memory for each input character:
http://online.effbot.org/2002_12_01_archive.htm#dom-bloat
creating all those objects take time...
toolkits that use a more pythonic api also tend to be more efficient; for
example, the pure python version of my elementtree module is typically
3-5 times faster than minidom, and uses less than half the memory:
http://effbot.org/zone/element-index.htm
you may be able to reach 10x with SAX-style custom code using pyexpat
(or sgmlop) directly...
http://www.python.org/doc/current/lib/module-xml.parsers.expat.html
http://effbot.org/zone/sgmlop-index.htm
...but to be on the safe side, I'd go for a C parser/tree builder. the following
two are about as fast as anything can be:
http://xmlsoft.org/python.html
http://www.reportlab.com/xml/pyrxp.html
(unfortunately, the C version of elementtree isn't yet ready for public
consumption...)
</F>
More information about the Python-list
mailing list