[Chicago] Out of Memory: Killed Process: on CentOS

David Beazley d-beazley at sbcglobal.net
Mon Apr 27 19:09:43 CEST 2009

Here's the shell of code that uses iterparse and saves memory

p = iterparse("somedoc.xml",('start','end'))

# Look for the parent node
for event,elem in p:
	if event == 'start' and elem.tag == 'parent':
               parent = elem

# Rip through children and discard as processed
for event,elem in p:
        if event == 'end' and elem.tag == 'child':
                # Do normal element tree processing on elem
                # Throw the child away

The key part is that last statement (removing children from the parse  
tree as you're done with them).


On Apr 27, 2009, at 12:04 PM, Brian Ray wrote:

> On Apr 27, 2009, at 11:51 AM, David Beazley wrote:
>>> If it's simply that cElementTree is keeping huge structures in  
>>> memory
>>> and you're getting too many concurrent requests for your RAM, move  
>>> to
>>> a SAX style API (state machine).
>> Also look at the ElementTree.iterparse() function.  If you use that  
>> in a clever way, you get all of the benefit of ElementTree plus the  
>> memory savings of SAX (basically you can iteratively rip through  
>> XML data and throw away the parts you're done with as you  go).
> I will take a look at that.  I wonder how would be best to prove or  
> dis-prove improvement's once I try that.
> Thanks, Brian Ray

More information about the Chicago mailing list