Why is xml.dom.minidom so slow?

Thu Jan 2 16:50:13 EST 2003

Bjorn Pettersen wrote:

> All I'm doing boils down to:
>
>   response = rf.nextResponse()
>   dom = parseString(response)
>
> in a loop. Am I doing something wrong? Is there a faster way when all I
> need is a traversable tree structure as the result?

as a general rule, XML toolkits that try to implement the DOM specification
in pure Python are incredibly slow and bloated.

on random XML data, minidom can easily gobble up a kilobyte or two for
each element.  in one of my benchmarks, it used about 50 bytes of object
memory for each input character:

    http://online.effbot.org/2002_12_01_archive.htm#dom-bloat

creating all those objects take time...

toolkits that use a more pythonic api also tend to be more efficient; for
example, the pure python version of my elementtree module is typically
3-5 times faster than minidom, and uses less than half the memory:

    http://effbot.org/zone/element-index.htm

you may be able to reach 10x with SAX-style custom code using pyexpat
(or sgmlop) directly...

    http://www.python.org/doc/current/lib/module-xml.parsers.expat.html
    http://effbot.org/zone/sgmlop-index.htm

...but to be on the safe side, I'd go for a C parser/tree builder.  the following
two are about as fast as anything can be:

    http://xmlsoft.org/python.html
    http://www.reportlab.com/xml/pyrxp.html

(unfortunately, the C version of elementtree isn't yet ready for public
consumption...)

</F>