"drop-in" DOM replacement for minidom?

Geoff Gerrietts geoff at gerrietts.net
Wed Aug 13 17:25:02 EDT 2003


Quoting Paul Miller (paul at fxtech.com):
> We've run into minidom's inabilty to handle large (20+MB) XML files, and
> need a replacement that can handle it. Unfortunately, we're pretty
> dependent on a DOM, so a pulldom or SAX replacement is likely out of the
> question for now.
> 
> Has someone done a more efficient minidom replacement module that we can
> just drop in? Preferrably written in C?

I've posted on a related topic in the past, when a friend of mine was
blowing thru 8GB of memory parsing a 30MB file in minidom. Pretty much
every response I got was of the general form "well what the hell are
you using DOM for? are you defective?" Some were more diplomatic than
others.

My friend also had some more challenging problems. He was running on a
DEC Alpha, I think under Digital Unix, and as a consequence 4Suite had
byte-ordering problems. PyRXP wouldn't compile for him, if I recall
correctly -- or maybe there were licensing problems? Anyway, he
ultimately settled on using pulldom; that gave him simplicity, speed,
and a small enough memory profile that it satisfied his needs.

Obviously it won't help in your case.

I don't think you'll find something that precisely mimics the minidom
module's interface, so you're going to hafta do some retooling.
However, I believe that if you can get 4Suite to compile, you might
find some love in there. There's a cDomlette component (labelled at
the time of my last reading as "experimental") that builds the parse
tree in C, with a minimal memory consumption.

Here's a link to something that should tell you how to make it work
(though when I personally used cDomlette, I seem to remember it being
harder than this....)

  http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/domlettes

Also, you may be interested in looking at the comparisons done by the
PyRXP folks on their page:

  http://www.reportlab.com/xml/pyrxp.html

Best of luck!

--G.

-- 
Geoff Gerrietts             "Whenever people agree with me I always 
<geoff at gerrietts net>     feel I must be wrong." --Oscar Wilde





More information about the Python-list mailing list