[XML-SIG] XML for scientific data storage and search

Ping Yeh ping at pingyeh.net
Wed Jan 12 20:28:18 CET 2005


Hello,

    I'm a newbie to XML, just wrote a program that can store my scientific
data objects as an XML file and restore them later (like marshaling).
However, I found it is extremely slow...  I changed the implementation from
minidom to sax.  It speeds up somewhat (30% or so) for small files but 
not enough.
If I go back to using binary data the speed is ~ 5 times faster or more.
Are there widely used ways to speed up parsing?

    Another problem is memory footprint.  My XML data file can be large:
10s of megabytes with 100 thousands of objects.  If I use 
xml.sax.parseString()
it parses the whole string into memory objects which inflats.  I only 
need to
loop over the objects in the XML file once.  Are there common ways to do
a delayed read?  I'm looking for something like

xml.sax.parseFile('data0.xml', myContentHandler)
objects = myContentHandler.getObjects()   # returns an iterator
for obj in objects:    # reading occurs here (delayed reading)
    # do something with obj...

But I haven't found any.  I'm not sure this is possible with current
architecture of parsers.  Any advise is highly appreciated.

Thanks,
Ping



More information about the XML-SIG mailing list