how come the child's tail affects the parent? does the tail attribute reach up to the parent? if so, will this be better?: for event, element in iterparse(f, tag="bla"): yield element for child in element: child.clear() # this might reach to it's parent, which is bla, which is ok because it's an 'end' event. I can probably also clear the children of previously parsed siblings of different tags (nephews).. as you see, I'm looking at ways to process a file that doesn't fit into memory. thanks for the help, Alon Horev On Wed, Jun 13, 2012 at 11:20 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
[fixed top-posting and code formatting]
Note that it's better to send plain text messages when posting to public mailing lists than HTML formatted messages.
Alon Horev, 13.06.2012 20:49:
On Wed, Jun 13, 2012 at 9:30 PM, Stefan Behnel wrote:
Alon Horev, 13.06.2012 20:16:
from lxml.etree import iterparse
def safe_iterparse(*args, **kwargs): for event, element in iterparse(*args, **kwargs): try: yield (event, element) finally: element.clear()
This is a known limitation of the current implementation:
the doc does warn: 'You should also avoid moving or discarding the element itself.' but the example does exactly what I do, which is to clear the element after the 'end' event. isn't the example contradicting the warning?
for event, element in etree.iterparse(StringIO(xml)): ... # ... do something with the element ... element.clear() # clean up children ... while element.getprevious() is not None: ... del element.getparent()[0] # clean up preceding siblings
Ah, yes, right. Thanks for catching that. Calling .clean() not only cleans up the children but also deletes the text content and the *tail* text. That is the actual problem with this code, because it touches tree state after the current (or latest) element.
I think it would be helpful to add a "with_tail" option to clear() for now.
Stefan _________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml