![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Stefan Behnel, 18.02.2012 11:20:
Amaury Forgeot d'Arc, 18.02.2012 10:08:
I made some modifications to pypy, cython and lxml, and now I can compile and install cython, lxml, and they seem to work!
For example:: html = etree.Element("html") body = etree.SubElement(html, "body") body.text = "TEXT" br = etree.SubElement(body, "br") br.tail = "TAIL" html.xpath("//text()")
Here are the changes I made, some parts are really hacks and should be polished: lxml: http://paste.pocoo.org/show/552903/
The weakref changes are really unfortunate as they appear in one of the most performance critical spots of lxml's API: on-the-fly proxy creation.
To give an idea of how much overhead there is, here's a micro-benchmark. First, parsing: $ python2.7 -m timeit -s 'import lxml.etree as et' \ 'et.parse("input3.xml")' 10 loops, best of 3: 136 msec per loop $ pypy -m timeit -s 'import lxml.etree as et' \ 'et.parse("input3.xml")' 10 loops, best of 3: 127 msec per loop I have no idea why pypy is faster here - there really isn't any interaction with the core during XML parsing, certainly nothing that would account for some 7% of the runtime. Maybe some kind of building, benchmarking or whatever fault on my side. Anyway, parsing is clearly in the same ballpark for both. However, when it comes to element proxy instantiation (collecting all elements in the XML tree here as a worst-case example), there is a clear disadvantage for PyPy: $ python2.7 -m timeit -s 'import lxml.etree as et; \ el=et.parse("input3.xml").getroot()' 'list(el.iter())' 10 loops, best of 3: 84 msec per loop $ pypy -m timeit -s 'import lxml.etree as et; \ el=et.parse("input3.xml").getroot()' 'list(el.iter())' 10 loops, best of 3: 1.29 sec per loop That's about the same factor of 15 that you got. This may or may not matter to applications, though, because there are many tools in lxml that allow users to be very selective about which proxies they want to see instantiated, and to otherwise let a lot of functionality execute in C. So applications may get away with a performance hit below that factor in practice. What certainly matters for applications is to get the feature set of lxml within PyPy. Stefan