
Would lxml run faster with pypy or is it irrelevant because most of the work is done by libxml?

Hi Martin, I think it is irrelevant. lxml is mainly written in cython (http://cython.org) and as you mention it rely on libxml2. Thomas 2018-06-23 19:53 GMT+02:00 Martin Mueller <martinmueller@northwestern.edu>:

Am 23. Juni 2018 19:53:44 MESZ schrieb Martin Mueller:
Would lxml run faster with pypy or is it irrelevant because most of the work is done by libxml?
It currently runs much slower in PyPy, because the interfacing with PyPy is much slower than with CPython. Below the interface, it's exactly the same speed, because everything runs in C, either in code from libxml2 or dedicated code in lxml. In fact, some of the fastest parts of lxml's API bypass libxml2's own implementations, specifically the tree search and iteration. But definitely not the parser and serialiser. What would run faster in PyPy, though, is your own Python code. If it spends a substantially larger part of its time in Python operations than in lxml operations, there might still be a gain with PyPy. But be aware that lxml-on-PyPy is a second class citizen. Or maybe third class. It definitively does not run as good there as on CPython, simply because their interface emulation layer still has many bugs. Stefan

Hi Martin, I think it is irrelevant. lxml is mainly written in cython (http://cython.org) and as you mention it rely on libxml2. Thomas 2018-06-23 19:53 GMT+02:00 Martin Mueller <martinmueller@northwestern.edu>:

Am 23. Juni 2018 19:53:44 MESZ schrieb Martin Mueller:
Would lxml run faster with pypy or is it irrelevant because most of the work is done by libxml?
It currently runs much slower in PyPy, because the interfacing with PyPy is much slower than with CPython. Below the interface, it's exactly the same speed, because everything runs in C, either in code from libxml2 or dedicated code in lxml. In fact, some of the fastest parts of lxml's API bypass libxml2's own implementations, specifically the tree search and iteration. But definitely not the parser and serialiser. What would run faster in PyPy, though, is your own Python code. If it spends a substantially larger part of its time in Python operations than in lxml operations, there might still be a gain with PyPy. But be aware that lxml-on-PyPy is a second class citizen. Or maybe third class. It definitively does not run as good there as on CPython, simply because their interface emulation layer still has many bugs. Stefan
participants (3)
-
Martin Mueller
-
Stefan Behnel
-
Thomas Burg