[fixed top-posting and code formatting] Note that it's better to send plain text messages when posting to public mailing lists than HTML formatted messages. Alon Horev, 13.06.2012 20:49:
On Wed, Jun 13, 2012 at 9:30 PM, Stefan Behnel wrote:
Alon Horev, 13.06.2012 20:16:
from lxml.etree import iterparse
def safe_iterparse(*args, **kwargs): for event, element in iterparse(*args, **kwargs): try: yield (event, element) finally: element.clear()
This is a known limitation of the current implementation:
the doc does warn: 'You should also avoid moving or discarding the element itself.' but the example does exactly what I do, which is to clear the element after the 'end' event. isn't the example contradicting the warning?
for event, element in etree.iterparse(StringIO(xml)): ... # ... do something with the element ... element.clear() # clean up children ... while element.getprevious() is not None: ... del element.getparent()[0] # clean up preceding siblings
Ah, yes, right. Thanks for catching that. Calling .clean() not only cleans up the children but also deletes the text content and the *tail* text. That is the actual problem with this code, because it touches tree state after the current (or latest) element. I think it would be helpful to add a "with_tail" option to clear() for now. Stefan