[XML-SIG] fast dump/restore of an XML document?

Anthony Baxter Anthony Baxter <anthony@interlink.com.au>
Thu, 18 May 2000 00:35:51 +1000


>>> Greg Stein wrote
> Yup... this was my thought. I think qp_xml gives you a good basis for
> taking the pyexpat callbacks and constructing list/dict structures.
> The conversion would probably take a while on a large data set -- you'd be
> iterating over every XML node. I'd recommend gutting qp_xml.

Ok, I've done this - is this of interest at all? I've now got something
that's just a big list containing lists and dictionaries, and I'm wrapping
a class around it to give a lot of DOM-like access to it (while still 
allowing the under-the hood fast dump/restore.)

Some timings:

qp is Greg's latest qp_xml.
qph is my hacked up version.

Test was my standard 2.5M XML file

qp parse done in 26.0s
qp pickle.dump done in 20.2s
qp pickle.load done in 88.1s

qph parse done in 21.0s
qph pickle.dump done in 12.6s
qph pickle.load done in 89.5s
qph marshal.dump done in 5.13s
qph marshal.load done in 6.92s

Some file sizes:

qp pickle file was 8.9M
qph pickle file was 6.6M
qph marshal file was 6.5M
qp gzipped pickle file was 990K
qph gzipped pickle file was 940K
qph gzipped marshal file was 590K (but marshal can't right directly to
a gzip.open() :( )

Memory consumption was also about 60% of the size of qp_xml's structure
(and qp_xml's about 40% of the size of the full xml size).

So, is it worth pursuing this as something suitable for release, or
just for my own internal use? Is there any interest in it? 

Anthony