[XML-SIG] XML for scientific data storage and search
Ping Yeh
ping at pingyeh.net
Wed Jan 12 20:28:18 CET 2005
Hello,
I'm a newbie to XML, just wrote a program that can store my scientific
data objects as an XML file and restore them later (like marshaling).
However, I found it is extremely slow... I changed the implementation from
minidom to sax. It speeds up somewhat (30% or so) for small files but
not enough.
If I go back to using binary data the speed is ~ 5 times faster or more.
Are there widely used ways to speed up parsing?
Another problem is memory footprint. My XML data file can be large:
10s of megabytes with 100 thousands of objects. If I use
xml.sax.parseString()
it parses the whole string into memory objects which inflats. I only
need to
loop over the objects in the XML file once. Are there common ways to do
a delayed read? I'm looking for something like
xml.sax.parseFile('data0.xml', myContentHandler)
objects = myContentHandler.getObjects() # returns an iterator
for obj in objects: # reading occurs here (delayed reading)
# do something with obj...
But I haven't found any. I'm not sure this is possible with current
architecture of parsers. Any advise is highly appreciated.
Thanks,
Ping
More information about the XML-SIG
mailing list