Python parsing XML file problem with SAX

Christian Heimes lists at cheimes.de
Mon Aug 9 19:39:44 EDT 2010


Am 10.08.2010 01:20, schrieb Aahz:
> The docs say, "Parses an XML section into an element tree incrementally".
> Sure sounds like it retains the entire parsed tree in RAM.  Not good.
> Again, how do you parse an XML file larger than your available memory
> using something other than SAX?

The document at
http://www.ibm.com/developerworks/xml/library/x-hiperfparse/ explains it
one way.

The iterparser approach is ingenious but it doesn't work for every XML
format. Let's say you have a 10 GB XML file with one million <part/>
tags. An iterparser doesn't load the entire document. Instead it
iterates over the file and yields (for example) one million ElementTrees
for each <part/> tag and its children. You can get the nice API of
ElementTree with the memory efficiency of a SAX parser if you obey
"Listing 4".

Christian




More information about the Python-list mailing list