Stefan Behnel wrote:
Martijn Faassen wrote:
Stefan Behnel wrote:
I just ran a slight variant (doesn't print, builds list) of Uche's OT benchmark on
And another difference is that you don't actually measure the overhead of the Python interpret startup, correct? :)
Uh, well, was that supposed to be in there, too? :)
It was in Uche's published benchmark if I recall correctly. Fredrik and I slightly disagreed. :)
I'm pleasantly surprised lxml_findall() now appears to have reached parity with cET in this test; it used to be that cET was quite a bit faster in my old measurements. Can you identify which tuning effort had this effect, or this due to a slightly different benchmark? Last I did a similar check we were still half the speed:
I'm not quite sure. I changed a lot of bits everywhere, in the XPath code, the proxy code and the Element creation code. Guess it was a mixture of all of them. There's quite a bit of fast-paths in there now that make a difference when you ask for a lot of elements.
.findall() does ask for lots of elements, so that might be helping then. Pretty good - I didn't expect there was such a gain to be made now, and we've got cElementTree parity now for this performance measurement. Perhaps we should collect all this benchmarking and check them, and then write some article for the lxml website... It'd be a bit of work to make that a solid piece of text, of course. People do pick up on benchmark figures in a rather lazy way sometimes. Last year I as I was developing lxml I honestly said when it was slower than cElementTree in my limited measurements, and I saw that referred to later as "According to Martijn Faassen, libxml might not be that fast with python anyway". For future google searchers: I didn't actually say that! lxml (and libxml2) is plenty fast with Python! So, with a benchmark page on the lxml site, we might get "Stefan Behnel says lxml is faster than anything all the time!" in people's heads instead. Note for future google searchers: Stefan never actually said that! I just made it up! But, lxml is plenty fast with Python! :)
BTW, note that there is even an element class lookup for ns/name involved in each element creation. Thus, the tests would yield similar timings with custom per-tag Python classes for elements. That's another thing ElementTree can't give you.
http://faassen.n--tree.net/blog/view/weblog/2005/01/17/0
Heh, the Uche quote on that page has been proven wrong, right? :)
Totally. What does that guy know anything about, anyway? :] (uh, he's not listening, is he?) :)
:) I like lots of what Uche's done, it's just we had a silly debate about benchmarks. Benchmarks unfortunately tend to invite such discussions.
I'd actually like to see something about lxml on "xml.com". lxml has been in a pretty usable state for quite a while now and is even nearing feature-completeness. Maybe we should just get out 1.0 and send Uche a friendly mail. *wink*
Yes. Unfortunately xml.com slightly changed its focus since last year, which appears to be less Python-related articles. Still, it's worth a shot contacting him. Regards, Martijn