[Chicago] Out of Memory: Killed Process: on CentOS

Mon Apr 27 19:09:43 CEST 2009

Here's the shell of code that uses iterparse and saves memory

p = iterparse("somedoc.xml",('start','end'))

# Look for the parent node
for event,elem in p:
	if event == 'start' and elem.tag == 'parent':
               parent = elem
               break

# Rip through children and discard as processed
for event,elem in p:
        if event == 'end' and elem.tag == 'child':
                # Do normal element tree processing on elem
                ...
                # Throw the child away
               parent.removeChild(elem)

The key part is that last statement (removing children from the parse  
tree as you're done with them).

Cheers,
Dave

On Apr 27, 2009, at 12:04 PM, Brian Ray wrote:

>
> On Apr 27, 2009, at 11:51 AM, David Beazley wrote:
>
>>>
>>> If it's simply that cElementTree is keeping huge structures in  
>>> memory
>>> and you're getting too many concurrent requests for your RAM, move  
>>> to
>>> a SAX style API (state machine).
>>>
>>
>> Also look at the ElementTree.iterparse() function.  If you use that  
>> in a clever way, you get all of the benefit of ElementTree plus the  
>> memory savings of SAX (basically you can iteratively rip through  
>> XML data and throw away the parts you're done with as you  go).
>
>
> I will take a look at that.  I wonder how would be best to prove or  
> dis-prove improvement's once I try that.
>
> Thanks, Brian Ray
>
>
>