cElementTree clear semantics
Igor V. Rafienko
igorr at ifi.uio.no
Sun Sep 25 13:54:01 EDT 2005
Hi,
I am trying to understand how cElementTree's clear works: I have a
(relatively) large XML file, that I do not wish to load into memory.
So, naturally, I tried something like this:
from cElementTree import iterparse
for event, elem in iterparse("data.xml"):
if elem.tag == "schnappi":
count += 1
elem.clear()
... which resulted in caching of all elements in memory except for
those named <schnappi> (i.e. the process' memory footprint grew more
and more). Then I though about clear()'ing all elements that I did not
really need:
from cElementTree import iterparse
for event, elem in iterparse("data.xml"):
if elem.tag == "schnappi":
count += 1
elem.clear()
... which gave a suitably small memory footprint, *BUT* since
<schnappi> has a number of subelements, and I subscribe to
'end'-events, the <schnappi> element is returned after all of its
subelements have been read and clear()'ed. So, I see indeed a
<schnappi> element, but calling its getiterator() gives me completely
empty subelements, which is not what I wanted :(
Finally, I thought about keeping track of when to clear and when not
to by subscribing to start and end elements (so that I would collect
the entire <schnappi>-subtree in memory and only than release it):
from cElementTree import iterparse
clear_flag = True
for event, elem in iterparse("data.xml", ("start", "end")):
if event == "start" and elem.tag == "schnappi":
# start collecting elements
clear_flag = False
if event == "end" and elem.tag == "schnappi":
clear_flag = True
# do something with elem
# unless we are collecting elements, clear()
if clear_flag:
elem.clear()
This gave me the desired behaviour, but:
* It looks *very* ugly
* It's twice as slow as version which sees 'end'-events only.
Now, there *has* to be a better way. What am I missing?
Thanks in advance,
ivr
--
"...but it's HDTV -- it's got a better resolution than the real world."
-- Fry, "When aliens attack"
More information about the Python-list
mailing list