Hi Steve, Steve Howe wrote:
Wednesday, March 22, 2006, 1:38:50 PM, you wrote:
2) string formatting in Python was the other problem. The major bottleneck in tree setup in bench.py was the python function that builds the element names based on loop variables (PyString_Format). Meaning, the bottleneck was /outside/ the tested code this time.
I wonder if running the same tests on cElementTree would point similar results in what concerns to the Python function calls.
Go ahead, try, using KCachegrind is pure fun! :)
Do you have any results (or impressions) on this ?
I didn't check, but I don't think it suffers so much from Python performance. As Fredrik said, cElementTree builds Python objects on the way in, so all you should see when /accessing/ data is Python's call overhead rather than any substantial calculations. I think that's totally the right optimization, but it is difficult to do something similar in lxml, since we also get entire trees from the parser. It wouldn't be a good idea to traverse them to build Python objects - we don't even know if they would be used. All we could do is cache Python objects once they were built. The Proxy mechanism would be the right place to keep references to text and tag objects. Also, you could to change the current way Python element proxies are deallocated to keep them alive as long as any of them is really used. But that's non-trivial. Anyway, to make me implement that, I would really have to be convinced that it's worth it - and I absolutely don't see enough of a speed-up behind these optimizations to encourage such a huge effort. Especially the text and tag properties are bound by call overhead, not by object creation time. Stefan
participants (1)
-
Stefan Behnel