
Nov. 24, 2014
8:51 a.m.
On 11/21/2014 08:31 PM, Charlie Clark wrote:
As noted elsewhere you can pass in a list of tags to do this. However, when running benchmarks in openpyxl we discovered that for pure parsing xml.etree.cElementTree can be *significantly* faster than lxml: 2 to 3 times in our experience. I discussed this with Stefan and he said it's largely down to the different c libraries – you pay a penalty for the richer interface of libmxml2.
Is cET still faster when every single tag is yielded by the iterator? The cET iterparse implementation does not appear to feature tag filtering, so I need to use Python if statements to do the filtering myself. I can imagine that this pretty much defeats the performance advantage...