Hi Steve,
Steve Howe wrote:
> Wednesday, March 22, 2006, 1:38:50 PM, you wrote:
>> 2) string formatting in Python was the other problem. The major bottleneck in
>> tree setup in bench.py was the python function that builds the element names
>> based on loop variables (PyString_Format). Meaning, the bottleneck was
>> /outside/ the tested code this time.
>
> I wonder if running the same tests on cElementTree would point similar
> results in what concerns to the Python function calls.
Go ahead, try, using KCachegrind is pure fun! :)
> Do you have any results (or impressions) on this ?
I didn't check, but I don't think it suffers so much from Python performance.
As Fredrik said, cElementTree builds Python objects on the way in, so all you
should see when /accessing/ data is Python's call overhead rather than any
substantial calculations.
I think that's totally the right optimization, but it is difficult to do
something similar in lxml, since we also get entire trees from the parser. It
wouldn't be a good idea to traverse them to build Python objects - we don't
even know if they would be used. All we could do is cache Python objects once
they were built. The Proxy mechanism would be the right place to keep
references to text and tag objects. Also, you could to change the current way
Python element proxies are deallocated to keep them alive as long as any of
them is really used. But that's non-trivial.
Anyway, to make me implement that, I would really have to be convinced that
it's worth it - and I absolutely don't see enough of a speed-up behind these
optimizations to encourage such a huge effort. Especially the text and tag
properties are bound by call overhead, not by object creation time.
Stefan