
Hi, here's a little status update regarding lxml on PyPy. I got the basics of lxml.etree working so far, mostly by patching up Cython and tracking down bugs in PyPy's cpyext (CPython C-API compatibility) layer. I'm still getting crashes during error reporting and didn't care much about XPath or XSLT yet. But given that those do not have much Python interaction per se, I don't expect major surprises on that front. The results are very encouraging, given that PyPy lacks support for many of the tweaks and hacks that are possible in CPython. Here's a little parser benchmark: $ python2.7 -m timeit -s 'import lxml.etree as et' 'et.parse("hamlet.xml")' 100 loops, best of 3: 4.61 msec per loop $ pypy -m timeit -s 'import lxml.etree as et' 'et.parse("hamlet.xml")' 100 loops, best of 3: 5.74 msec per loop Pretty acceptable. That makes lxml the fastest XML parser that currently exists for PyPy. And here's a worst case benchmark for element proxy instantiation and iteration, likely the most heavily tuned parts of lxml when running in CPython: $ python2.7 -m timeit -s 'import lxml.etree as et; \ t=et.parse("hamlet.xml")' 'list(t.iter())' 100 loops, best of 3: 2.71 msec per loop $ pypy -m timeit -s 'import lxml.etree as et; \ t=et.parse("hamlet.xml")' 'list(t.iter())' 10 loops, best of 3: 28.2 msec per loop That's about a factor of 10. Sounds huge, but it's actually not bad, considering the amount of extra work that has to be done for PyPy here. Certainly doesn't render it unusable, we are still talking milliseconds after all. And so far, there hasn't gone any tuning into it, so it's not the final word. I'm pretty optimistic. BTW, if you're interested in improvements on this front, you can help getting this done faster by using the "donate" button on lxml's project home page. Any donation will help in freeing some of my time for this. Stefan