Re: [lxml] Efficient incremental parsing using etree.iterparse

Nov. 24, 2014


      On 11/21/2014 08:31 PM, Charlie Clark wrote:
...
As noted elsewhere you can pass in a list of tags to do this. However,
when running benchmarks in openpyxl we discovered that for pure parsing
xml.etree.cElementTree can be *significantly* faster than lxml: 2 to 3
times in our experience. I discussed this with Stefan and he said it's
largely down to the different c libraries – you pay a penalty for the
richer interface of libmxml2.
Is cET still faster when every single tag is yielded by the iterator?
The cET iterparse implementation does not appear to feature tag
filtering, so I need to use Python if statements to do the filtering
myself. I can imagine that this pretty much defeats the performance
advantage...

Re: [lxml] Efficient incremental parsing using etree.iterparse

D.H.J. Takken