cElementTree clear semantics

Igor V. Rafienko igorr at ifi.uio.no
Sun Sep 25 19:54:01 CEST 2005


Hi,


I am trying to understand how cElementTree's clear works: I have a
(relatively) large XML file, that I do not wish to load into memory.
So, naturally, I tried something like this:

from cElementTree import iterparse
for event, elem in iterparse("data.xml"):
    if elem.tag == "schnappi":
        count += 1
        elem.clear()

... which resulted in caching of all elements in memory except for
those named <schnappi> (i.e. the process' memory footprint grew more
and more). Then I though about clear()'ing all elements that I did not
really need:

from cElementTree import iterparse
for event, elem in iterparse("data.xml"):
    if elem.tag == "schnappi":
        count += 1
    elem.clear()

... which gave a suitably small memory footprint, *BUT* since
<schnappi> has a number of subelements, and I subscribe to
'end'-events, the <schnappi> element is returned after all of its
subelements have been read and clear()'ed. So, I see indeed a
<schnappi> element, but calling its getiterator() gives me completely
empty subelements, which is not what I wanted :(

Finally, I thought about keeping track of when to clear and when not
to by subscribing to start and end elements (so that I would collect
the entire <schnappi>-subtree in memory and only than release it):

from cElementTree import iterparse
clear_flag = True
for event, elem in iterparse("data.xml", ("start", "end")):
    if event == "start" and elem.tag == "schnappi":
	# start collecting elements
        clear_flag = False
    if event == "end" and elem.tag == "schnappi":
        clear_flag = True
        # do something with elem
    # unless we are collecting elements, clear()
    if clear_flag:
        elem.clear()

This gave me the desired behaviour, but:

* It looks *very* ugly
* It's twice as slow as version which sees 'end'-events only.

Now, there *has* to be a better way. What am I missing?

Thanks in advance,





ivr
-- 
"...but it's HDTV -- it's got a better resolution than the real world."
		                           -- Fry, "When aliens attack"



More information about the Python-list mailing list