how come the child's tail affects the parent? does the tail attribute reach up to the parent?
if so, will this be better?:

for event, element in iterparse(f, tag="bla"):
    yield element
    for child in element:
        child.clear() # this might reach to it's parent, which is bla, which is ok because it's an 'end' event.

I can probably also clear the children of previously parsed siblings of different tags (nephews)..
as you see, I'm looking at ways to process a file that doesn't fit into memory. 


                 thanks for the help, Alon Horev

On Wed, Jun 13, 2012 at 11:20 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
[fixed top-posting and code formatting]

Note that it's better to send plain text messages when posting to public
mailing lists than HTML formatted messages.

Alon Horev, 13.06.2012 20:49:
> On Wed, Jun 13, 2012 at 9:30 PM, Stefan Behnel wrote:
>> Alon Horev, 13.06.2012 20:16:
>>> from lxml.etree import iterparse
>>>
>>> def safe_iterparse(*args, **kwargs):
>>>     for event, element in iterparse(*args, **kwargs):
>>>         try:
>>>             yield (event, element)
>>>         finally:
>>>             element.clear()
>>
>> This is a known limitation of the current implementation:
>>
>> http://lxml.de/parsing.html#modifying-the-tree
>
> the doc does warn: 'You should also avoid moving or discarding the element
> itself.'
> but the example does exactly what I do, which is to clear the element after
> the 'end' event. isn't the example contradicting the warning?
>
> >>> for event, element in etree.iterparse(StringIO(xml)):
> ...     # ... do something with the element
> ...     element.clear()                 # clean up children
> ...     while element.getprevious() is not None:
> ...         del element.getparent()[0]  # clean up preceding siblings

Ah, yes, right. Thanks for catching that. Calling .clean() not only cleans
up the children but also deletes the text content and the *tail* text. That
is the actual problem with this code, because it touches tree state after
the current (or latest) element.

I think it would be helpful to add a "with_tail" option to clear() for now.

Stefan
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml@lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml