[pypy-dev] cpyext performance
Antonio Cuni
anto.cuni at gmail.com
Wed Jul 19 06:15:51 EDT 2017
Hello,
recently I have been playing a bit with cpyext, so see if there are low
haning fruits to be taken to improve the performance.
I didn't get any real result but I think it's interesting to share my
findings.
The benchmark I'm using is here:
https://github.com/antocuni/cpyext-benchmarks
it contains a simple C extension defining three methods, one for each
METH_NOARGS, METH_O and METH_VARARGS flags.
So first, the results with CPython and PyPy 5.8:
$ python bench.py
noargs : 0.78 secs
onearg : 0.89 secs
varargs: 1.05 secs
$ pypy bench.py
noargs : 1.67 secs
onearg : 2.13 secs
varargs: 4.89 secs
Then, I tried my cpyext-jit branch; this branch does two things:
1) it makes cpyext visible to the JIT, and add enough @jit.dont_look_inside
so that it actually compiles
2) merges part of the cpyext-callopt branch, up to rev 9cbc8bd76297 (more
on this later): this adds fast paths for METH_NOARGS and METH_O to avoid
going through the slow __args__.unpack():
$ pypy-cpyext-jit bench.py
noargs : 0.30 secs
onearg : 0.31 secs
varargs: 4.90 secs
So, apparently this is enough to greatly speedup the calls, and be even
faster than CPython. Note that "onearg" calls "simple.onearg(None)".
However, things become more complicated as soon as I start passing various
kind of objects to onearg():
$ pypy bench_oneargs.py # pypy 5.8
onearg(None): 2.09 secs
onearg(1) : 2.07 secs
onearg(i) : 4.98 secs
onearg(i%2) : 4.92 secs
onearg(X) : 2.13 secs
onearg((1,)): 2.30 secs
onearg((i,)): 9.80 secs
$ pypy-cpyext-jit bench_oneargs.py
onearg(None): 0.30 secs
onearg(1) : 0.30 secs
onearg(i) : 2.52 secs
onearg(i%2) : 2.56 secs
onearg(X) : 0.30 secs
onearg((1,)): 0.30 secs
onearg((i,)): 7.45 secs
so, the call optimization still helps, but as soon as we need to convert
one object from pypy to cpython we are horribly slow. However, it is
interesting to note that:
1) if we pass a constant object, we are fast: None, 1, (1,)
2) if we pass X (which is a global X=100), we are still fast
3) any other object which is created on the fly is slow
Looking at the traces, they look more or less the same in the three cases,
so I don't really understand what is the difference.
Finally, about the branch cpyext-callopt, which was started in Leysin by
Richard, Armin and me: I am not sure to fully understand the purpose of
dbba78b270fd: apparently, the optimization done in 9cbc8bd76297 seems to
work well, so what am I missing?
ciao,
Anto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20170719/d2b12103/attachment-0001.html>
More information about the pypy-dev
mailing list