[XML-SIG] Large xml databases and python
Fredrik Lundh
fredrik at pythonware.com
Mon Aug 21 14:38:24 CEST 2006
Anstey, Matthew wrote:
> Our question is this: when we finish porting our 300Mb "python" data
> into 3Gb of XML data, how can we continue to read it from disk in its
> xml format and manipulate it?
>
> We are looking at Berkeley XML with the Python API, but are concerned
> this is not the best solution. we have also dabbled with Amara and
> ElementTree, but the size our our XML is giving us problems.
if the Python version of the data fits in memory, you can use iterparse
and the "incremental decoding" approach outlined here:
http://effbot.org/zone/element-iterparse.htm
to save the data, you can build subtrees (e.g. on a record level) and
write each tree out by itself.
f = open("out.xml", "w")
f.write("<data>")
for record in data:
tree = make_record_tree(record)
tree.write(f)
f.write("</data>")
f.close()
</F>
More information about the XML-SIG
mailing list