[pypy-dev] cpyext performance

Antonio Cuni anto.cuni at gmail.com
Wed Jul 19 06:15:51 EDT 2017


Hello,
recently I have been playing a bit with cpyext, so see if there are low
haning fruits to be taken to improve the performance.

I didn't get any real result but I think it's interesting to share my
findings.
The benchmark I'm using is here:
https://github.com/antocuni/cpyext-benchmarks

it contains a simple C extension defining three methods, one for each
METH_NOARGS, METH_O and METH_VARARGS flags.

So first, the results with CPython and PyPy 5.8:

$ python bench.py
noargs : 0.78 secs
onearg : 0.89 secs
varargs: 1.05 secs

$ pypy bench.py
noargs : 1.67 secs
onearg : 2.13 secs
varargs: 4.89 secs

Then, I tried my cpyext-jit branch; this branch does two things:
1) it makes cpyext visible to the JIT, and add enough @jit.dont_look_inside
so that it actually compiles
2) merges part of the cpyext-callopt branch, up to rev 9cbc8bd76297 (more
on this later): this adds fast paths for METH_NOARGS and METH_O to avoid
going through the slow __args__.unpack():

$ pypy-cpyext-jit bench.py
noargs : 0.30 secs
onearg : 0.31 secs
varargs: 4.90 secs

So, apparently this is enough to greatly speedup the calls, and be even
faster than CPython. Note that "onearg" calls "simple.onearg(None)".

However, things become more complicated as soon as I start passing various
kind of objects to onearg():

$ pypy bench_oneargs.py   # pypy 5.8
onearg(None): 2.09 secs
onearg(1)   : 2.07 secs
onearg(i)   : 4.98 secs
onearg(i%2) : 4.92 secs
onearg(X)   : 2.13 secs
onearg((1,)): 2.30 secs
onearg((i,)): 9.80 secs


$ pypy-cpyext-jit bench_oneargs.py
onearg(None): 0.30 secs
onearg(1)   : 0.30 secs
onearg(i)   : 2.52 secs
onearg(i%2) : 2.56 secs
onearg(X)   : 0.30 secs
onearg((1,)): 0.30 secs
onearg((i,)): 7.45 secs


so, the call optimization still helps, but as soon as we need to convert
one object from pypy to cpython we are horribly slow. However, it is
interesting to note that:
1) if we pass a constant object, we are fast: None, 1, (1,)
2) if we pass X (which is a global X=100), we are still fast
3) any other object which is created on the fly is slow

Looking at the traces, they look more or less the same in the three cases,
so I don't really understand what is the difference.

Finally, about the branch cpyext-callopt, which was started in Leysin by
Richard, Armin and me: I am not sure to fully understand the purpose of
dbba78b270fd: apparently, the optimization done in 9cbc8bd76297 seems to
work well, so what am I missing?

ciao,
Anto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20170719/d2b12103/attachment-0001.html>


More information about the pypy-dev mailing list