What means exactly "Memory error"?
Fredrik Lundh
fredrik at pythonware.com
Fri Apr 25 13:51:41 EDT 2003
Albert Hofkamp wrote:
> Do you need all data DOMmed at once?
> You may be able to have one DOM tree at a time, dropping and reloading
> everytime you switch file.
an alternative is to use an incremental tree builder, and process the
subtrees as they arrive.
here's an example, using my elementtree module:
from elementtree import ElementTree
class MyBuilder(ElementTree.TreeBuilder):
def end(self, tag):
elem = ElementTree.TreeBuilder.end(self, tag)
if elem.tag == "SCENE":
# process(elem) in some way, and write it out
# ElementTree.ElementTree(elem).write(sys.output)
elem.clear() # nuke it
parser = ElementTree.XMLTreeBuilder()
parser._target = MyBuilder() # plug in a custom builder!
tree = ElementTree.parse(filename, parser)
I've tested this with a 10 megabyte XML file created by concatenating
Jon Bosak's Hamlet XML file over and over again, and wrapping it all in a
single document element.
the resulting file contains 720 scenes (about 15k each, in average).
the above script requires about 4.5 megabytes to run to completion, and
about 2 minutes processing time (on a really slow machine).
if I comment out the elem.clear() call, the script requires about 75 mega-
bytes, in about 15 minutes (13 of which were spent on swapping; I ran the
test on a machine with 96 megabytes RAM and really slow disks... ;-)
for more information on element trees, see:
http://effbot.org/zone/element-index.htm
http://www.xml.com/pub/a/2003/02/12/py-xml.html
</F>
More information about the Python-list
mailing list