Hi Martijn, Martijn Faassen wrote:
Cool. Impressive for cET.findall(), as it's using a Python implementation of the search algorithm - the same one as used by ElementTree, last I checked.
Actually that's not really surprising. cET has the Python objects readily available and only accesses them. We have to generate them on the fly. And ElementPath is simple enough to be fast. That's like comparing a spoon with a swiss knife.
I'm pleasantly surprised lxml_findall() now appears to have reached parity with cET in this test; it used to be that cET was quite a bit faster in my old measurements. Can you identify which tuning effort had this effect, or this due to a slightly different benchmark?
I figured out what it was. _elementpyth.py calls element.getiterator(). lxml originally collected all children in a list to emulate that. Now it has a real iterator implementation. It could even be faster as _elementpath.py nicely asks for the tag it looks for. Currently, this filters /behind/ the iterator, so all elements are still generated (and we're still close to parity!). Maybe I should check what difference it makes if we filter in plain C... Stefan