[Chicago] Out of Memory: Killed Process: on CentOS
David Beazley
d-beazley at sbcglobal.net
Mon Apr 27 19:09:43 CEST 2009
Here's the shell of code that uses iterparse and saves memory
p = iterparse("somedoc.xml",('start','end'))
# Look for the parent node
for event,elem in p:
if event == 'start' and elem.tag == 'parent':
parent = elem
break
# Rip through children and discard as processed
for event,elem in p:
if event == 'end' and elem.tag == 'child':
# Do normal element tree processing on elem
...
# Throw the child away
parent.removeChild(elem)
The key part is that last statement (removing children from the parse
tree as you're done with them).
Cheers,
Dave
On Apr 27, 2009, at 12:04 PM, Brian Ray wrote:
>
> On Apr 27, 2009, at 11:51 AM, David Beazley wrote:
>
>>>
>>> If it's simply that cElementTree is keeping huge structures in
>>> memory
>>> and you're getting too many concurrent requests for your RAM, move
>>> to
>>> a SAX style API (state machine).
>>>
>>
>> Also look at the ElementTree.iterparse() function. If you use that
>> in a clever way, you get all of the benefit of ElementTree plus the
>> memory savings of SAX (basically you can iteratively rip through
>> XML data and throw away the parts you're done with as you go).
>
>
> I will take a look at that. I wonder how would be best to prove or
> dis-prove improvement's once I try that.
>
> Thanks, Brian Ray
>
>
>
More information about the Chicago
mailing list