"drop-in" DOM replacement for minidom?

Uche Ogbuji uche at ogbuji.net
Fri Aug 15 09:05:45 EDT 2003


Geoff Gerrietts <geoff at gerrietts.net> wrote in message news:<mailman.1060810044.23356.python-list at python.org>...
> Quoting Paul Miller (paul at fxtech.com):
> > We've run into minidom's inabilty to handle large (20+MB) XML files, and
> > need a replacement that can handle it. Unfortunately, we're pretty
> > dependent on a DOM, so a pulldom or SAX replacement is likely out of the
> > question for now.
> > 
> > Has someone done a more efficient minidom replacement module that we can
> > just drop in? Preferrably written in C?
> 
> I've posted on a related topic in the past, when a friend of mine was
> blowing thru 8GB of memory parsing a 30MB file in minidom. Pretty much
> every response I got was of the general form "well what the hell are
> you using DOM for? are you defective?" Some were more diplomatic than
> others.

My response is usually more like "what are you using XML for a single
30MB file for?"

I've long maintained that when working with XML, modest document sizes
is very important, regardless of what tools you're using.

But that having been said, some documents are 30MB, and it makes sense
that they're 30MB, and that's just the way it is.


> My friend also had some more challenging problems. He was running on a
> DEC Alpha, I think under Digital Unix, and as a consequence 4Suite had
> byte-ordering problems.

4Suite used to have byte-ordering problems, originally reported under
Solaris 9, and also affecting some Mac OS X users.   Those are fixed
now.


> PyRXP wouldn't compile for him, if I recall
> correctly -- or maybe there were licensing problems? Anyway, he
> ultimately settled on using pulldom; that gave him simplicity, speed,
> and a small enough memory profile that it satisfied his needs.
> 
> Obviously it won't help in your case.

pulldom is always worth considering.

http://www-106.ibm.com/developerworks/xml/library/x-tipulldom.html

> I don't think you'll find something that precisely mimics the minidom
> module's interface, so you're going to hafta do some retooling.
> However, I believe that if you can get 4Suite to compile,

Which I hardly expect to be a problem.

> you might
> find some love in there. There's a cDomlette component (labelled at
> the time of my last reading as "experimental")

cDomlette hasn't been experimental for nearly a year now.  We use it
heavily in production.


> that builds the parse
> tree in C, with a minimal memory consumption.

And fast parse and mutation time.


> Here's a link to something that should tell you how to make it work
> (though when I personally used cDomlette, I seem to remember it being
> harder than this....)
> 
>   http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/domlettes

Your memories must be from long ago :-)  That API is how it's been for
a while.


> Also, you may be interested in looking at the comparisons done by the
> PyRXP folks on their page:
> 
>   http://www.reportlab.com/xml/pyrxp.html
> 
> Best of luck!

Ditto.

--Uche
http://uche.ogbuji.net




More information about the Python-list mailing list