
Marius Gedminas schrieb am 27.02.2015 um 09:34:
The risk of modifying a data structure while you're iterating over it is that you may skip elements, or process some elements more than once, or get an exception. [...] Now I don't know if this danger applies to lxml. I tried to look it up in the documentation and failed to find anything relevant. I then tried a small experiment and couldn't get my code to misbehave, so perhaps lxml's iteration can safely cope with modifications. (Or perhaps my code example was just too simple to trigger a possible error condition,
Most likely so. lxml currently looks one match ahead. This has the advantage that tree modifications during iteration work in many cases. It has the disadvantage that in a large document where an element only appears once, the whole document is searched despite already having found the only match. Given that this is very fast in lxml, it usually doesn't matter that much, but it's certainly visible in some extreme cases. Don't rely on this, though. It might change at some point to, say, only look one element ahead, instead of one element that actually matches the current search. That would reduce the search overhead in the "one element only" case. Generally speaking, it's safe to modify parts of the tree that no longer need to be touched by the traversal (such as siblings that were already traversed or attributes of the current element), but the behaviour when modifying tree content that lies ahead or above (ancestors) is undefined. If unsure, follow one of the examples that you (Marius) gave in your email. Structural tree modifications are best done outside of the iteration loop. Stefan