[pypy-dev] Cython and cpyext performance (was: Re: libgmp)

Fri Feb 24 11:32:07 CET 2012

Hi Amaury,

Amaury Forgeot d'Arc, 24.02.2012 10:27:
> 2012/2/24 Stefan Behnel
>> So, as a reply to the OP: seeing the current advances in getting Cython
>> code to run in PyPy (interfacing at the C-API level), another option would
>> be to do exactly as you would in CPython and implement the performance
>> critical parts of the code that heavily rely on gmp interaction in Cython
>> (to get plain C code out of it), and then import and use that from PyPy,
>> but at a much higher level than with direct individual calls to the gmp
>> functions. However much faster ctypes can be in PyPy than in CPython, you
>> just can't beat the performance of a straight block of C code when the goal
>> is to talk to C code. Cython will also allow you to run code in parallel
>> with OpenMP, in case you need that.
> 
> Cython won't be fast with PyPy. The C code it generates is too much
> specialized for CPython.

Well, you have to distinguish between the C code it generates and the C-API
calls that it generates to emulate Python semantics. The C code itself
isn't impacted by PyPy. There are quite a lot of people who use Cython
because it gives them plain C with a friendly syntax and Python features
when they need them (e.g. for exceptions).

> For example, I've seen huge slowdowns in the Cython program itself (while
> compiling lxml, for example)
> when the various Cython extension modules (Nodes.c for example) started to
> compile and became available for pypy.

Right, Cython shouldn't compile itself when running in PyPy, that makes no
sense at all. Note that there's a --no-cython-compile option that you can
pass to setup.py. It's Python, compilation is completely optional. I'll
disable it when installing in PyPy.

BTW, from what I've seen so far, the compile times tend to be faster in
CPython than in PyPy because even compiling something as large as lxml
doesn't seem to run long enough to take much advantage of the JIT. But
compiling down to cpyext and going through that is just wrong in any way.

> I was about to write the list of operations that cpyext performs for
> PyTuple_GET_ITEM, but this macro is too large to fit in the margin :-)

Interesting. Is there a better way of doing this? CPython doesn't seem to
provide a way to ask for owned references, even though Cython will quite
often (but not always) call Py_INCREF() on the result anyway. Would the
tuple parsing functions help, for example? There may well be cases where we
could use those. Wouldn't be the first time I rewrite the function argument
handling code, for example. ;-)

> And PyDict_Next is a nightmare:
> https://bitbucket.org/pypy/pypy/src/9f0e8a37712b/pypy/module/cpyext/dictobject.py#cl-177

Right, very good example. Looks like O(n^2), which is clearly horrible for
this.

Actually, even in CPython PyDict_Next() is only about 30% faster than going
through the "lookup and call .iteritems() for dict, loop over the iterator"
dance. I found that rather disappointing.

I had always wanted to make this optimisation optional in the C code so
that we can apply it optimistically at runtime to any unknown object that
someone calls an ".iteritems()" method on. When I get around to finishing
that up (AFAIR, I once had a half-baked patch for it), we can just as well
disable it for PyPy at C compile time and always take the normal iterator
path there. Looks like that would help a lot.

> It's one thing to be able to run lxml and other Cython-based libraries with
> pypy.
> it's another thing to pretend that Cython is the way to write fast modules
> on pypy.

It's *one* way. I'm very well aware that it depends on the ratio of C-level
code and Python-level code - we see that all the time, also in CPython.

> You may win on the generated C code, but will loose when interfacing with
> the rest
> of the Python interpreter (which is also the reason why ctypes is so slow).

Absolutely. I think that's where most of the heated misunderstandings on
both sides come from. On the Cython side, the goal is to drop your code
into C to make it predictably fast. On the PyPy side, the goal is to move
it into Python to let the JIT do its work (and look for ways to improve the
JIT if this doesn't work out).

In practice, I think that either side is right for the right kind of
problem. And, no, I really don't care who is "more" right.

> This said, I've added all the functions you mentioned in a previous email.

Very cool, thanks!! I already implemented it for Cython but hadn't push it
up yet.

Just in case: did you see my patch in the CPython tracker? I made
PyErr_GetExcInfo() return new references and PyErr_SetExcInfo() steal the
references - that behaviour fits best with all important use cases.

> You may try your tests again with the nightly build.

I will.

Stefan