What can Cython do for PyPy?
Hi, there has recently been a move towards a .NET/IronPython port of Cython, mostly driven by the need for a fast NumPy port. During the related discussion, the question came up how much it would take to let Cython also target other runtimes, including PyPy. Given that PyPy already has a CPython C-API compatibility layer, I doubt that it would be hard to enable that. With my limited knowledge about the internals of that layer, I guess the question thus becomes: is there anything Cython could do to the C code it generates that would make the Cython generated extension modules run faster/better/safer on PyPy than they would currently? I never tried to make a Cython module actually run on PyPy (simply because I don't use PyPy), but I have my doubts that they'd run perfectly out of the box. While generally portable, I'm pretty sure the C code relies on some specific internals of CPython that PyPy can't easily (or efficiently) provide. Stefan
Hi Stefan. CPython extension compatibility layer is in alpha at best. I heavily doubt that anything would run out of the box. However, this is a cpython compatiblity layer anyway, it's not meant to be used as a long term solutions. First of all it's inneficient (and unclear if will ever be), but it's also unjitable. This means that to JIT, cpython extension is like a black box which should not be touched. Also, several concepts, like refcounting are completely alien to pypy and emulated. For example for numpy, I think a rewrite is necessary to make it fast (and as experiments have shown, it's possible to make it really fast), so I would not worry about using cython for speeding things up. In theory you should not need it and the boundary layer between cython-compiled code and JITted code would make you suffer anyway. There is another usecase for using cython for providing access to C libraries. This is a bit harder question and I don't have a good answer for that, but maybe cpython compatibility layer would be good enough in this case? I can't see how Cython can produce a "native" C code instead of CPython C code without some major effort. Cheers, fijal On Thu, Aug 12, 2010 at 8:49 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Hi,
there has recently been a move towards a .NET/IronPython port of Cython, mostly driven by the need for a fast NumPy port. During the related discussion, the question came up how much it would take to let Cython also target other runtimes, including PyPy.
Given that PyPy already has a CPython C-API compatibility layer, I doubt that it would be hard to enable that. With my limited knowledge about the internals of that layer, I guess the question thus becomes: is there anything Cython could do to the C code it generates that would make the Cython generated extension modules run faster/better/safer on PyPy than they would currently? I never tried to make a Cython module actually run on PyPy (simply because I don't use PyPy), but I have my doubts that they'd run perfectly out of the box. While generally portable, I'm pretty sure the C code relies on some specific internals of CPython that PyPy can't easily (or efficiently) provide.
Stefan
_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Maciej Fijalkowski, 12.08.2010 10:05:
On Thu, Aug 12, 2010 at 8:49 AM, Stefan Behnel wrote:
there has recently been a move towards a .NET/IronPython port of Cython, mostly driven by the need for a fast NumPy port. During the related discussion, the question came up how much it would take to let Cython also target other runtimes, including PyPy.
Given that PyPy already has a CPython C-API compatibility layer, I doubt that it would be hard to enable that. With my limited knowledge about the internals of that layer, I guess the question thus becomes: is there anything Cython could do to the C code it generates that would make the Cython generated extension modules run faster/better/safer on PyPy than they would currently? I never tried to make a Cython module actually run on PyPy (simply because I don't use PyPy), but I have my doubts that they'd run perfectly out of the box. While generally portable, I'm pretty sure the C code relies on some specific internals of CPython that PyPy can't easily (or efficiently) provide.
CPython extension compatibility layer is in alpha at best. I heavily doubt that anything would run out of the box. However, this is a cpython compatiblity layer anyway, it's not meant to be used as a long term solutions. First of all it's inneficient (and unclear if will ever be)
If you only use it to call into non-trivial Cython code (e.g. some heavy calculations on NumPy tables), the call overhead should be mostly negligible, maybe even close to that in CPython. You could even provide some kind of fast-path to 'cpdef' functions (i.e. functions that are callable from both C and Python) and 'api' functions (which are currently exported at the module API level using the PyCapsule mechanism). That would reduce the call overhead to that of a C call. Then, a lot of Cython code doesn't do much ref-counting and the like but simply runs in plain C. So, often enough, there won't be that much overhead involved in the code itself either, especially in tight loops where users prune away all CPython interaction anyway.
but it's also unjitable. This means that to JIT, cpython extension is like a black box which should not be touched.
Well, unless both sides learn about each other, that is. It won't necessarily impact the JIT, but then again, a JIT usually won't have a noticeable impact on the performance of Cython code anyway.
Also, several concepts, like refcounting are completely alien to pypy and emulated.
Sure. That's why I asked if there is anything that Cython can help to improve here. For example, the code it generates for INCREF/DECREF operations is not only configurable at the C preprocessor level.
For example for numpy, I think a rewrite is necessary to make it fast (and as experiments have shown, it's possible to make it really fast), so I would not worry about using cython for speeding things up.
This isn't only about making things fast when being rewritten. This is also about accessing and reusing existing code in a new environment. Cython is becoming increasingly popular in the numerics community, and a lot of Cython code is being written as we speak, not only in the SciPy/NumPy environment. People even find it attractive enough to start rewriting their CPython extension modules (most often library wrappers) from C in Cython, both for performance and TCO reasons.
There is another usecase for using cython for providing access to C libraries. This is a bit harder question and I don't have a good answer for that, but maybe cpython compatibility layer would be good enough in this case? I can't see how Cython can produce a "native" C code instead of CPython C code without some major effort.
Native (standalone) C code isn't the goal, just something that adapts well to what PyPy can provide as a CPython compatibility layer. If Cython modules work across independent Python implementations, that would be the most simple way by far to make lots of them available cross-platform, thus making it a lot simpler to switch between different implementations. Stefan
I agree with the motivations given by Stefan - two interesting possibilities would be to: a) first, test the compatibility layer with Cython generated code b) possibly, allow users to use the Python API while replacing refcounting with another, more meaningful, PyPy-specific API* for a garbage collected heap. However, such an API is radically different. I'm also not sure how well such an API would mesh with the CPython API, actually. If Cython could support such an API, that would be great. But I'm unsure whether this is worth it, for Cython, and more in general for other modules (one could easily and elegantly support both CPython and PyPy with preprocessor tricks). See further below about why call overhead is not the biggest performance problem when not inlining. * I thought the Java Native Interface (JNI) design of local and global references (http://download.oracle.com/javase/6/docs/technotes/guides/jni/spec/design.ht...) would work here, with some adaptation. However, if your moving GCs support pinning of objects, as I expect to be necessary to interact with CPython code, I would do an important change to that API: instead of having object references be pointers to (movable by the GC) pointers to objects, like in the JNI API, PyPy should use plain pinned pointers. The pinning would not be apparent in the type, but that should be fine I guess. Problems arise when PyPy-aware code calls code which still uses the refcounting API. It is mostly safe to ignore the refcounting (even decreases) for local references, but I'm unsure about persistent references, even if it's probably still the best solution, so that the PyPy-aware code handles the lifecycle by itself. On Thu, Aug 12, 2010 at 11:25, Stefan Behnel <stefan_ml@behnel.de> wrote:
Maciej Fijalkowski, 12.08.2010 10:05:
On Thu, Aug 12, 2010 at 8:49 AM, Stefan Behnel wrote:
If you only use it to call into non-trivial Cython code (e.g. some heavy calculations on NumPy tables), the call overhead should be mostly negligible, maybe even close to that in CPython. You could even provide some kind of fast-path to 'cpdef' functions (i.e. functions that are callable from both C and Python) and 'api' functions (which are currently exported at the module API level using the PyCapsule mechanism). That would reduce the call overhead to that of a C call.
but it's also unjitable. This means that to JIT, cpython extension is like a black box which should not be touched.
Well, unless both sides learn about each other, that is. It won't necessarily impact the JIT, but then again, a JIT usually won't have a noticeable impact on the performance of Cython code anyway.
Call overhead is not the biggest problem, I guess (well, if it's bigger than that in C, it might be); it's IMHO the minor problem when you can't inline. Inlining is important because it allows to do more optimizations on the combined code. Now, it might or might not apply to your typical use cases (present and future), you should just keep this issue in mind, too. Whenever you say "If you only use it to call into non-trivial Cython code", you imply that some kind of functional abstraction, the one where you write short functions, such as accessors, are not efficiently supported. For instance, if you call two functions, each containing a parallel for loops, fusing the loops requires inlining the functions to expose the loops. Inlining accessors (getters and setters) allows to recognize that they often don't need to be called over and over again, i.e., common subexpression elimination, which you can't do on a normal (impure) function. To make a particularly dramatic example (since it comes from C) of a quadratic-to-linear optimization: a loop like for (i = 0; i < strlen(s); i++) { //do something on s without modifying it } takes quadratic time, because strlen takes linear time and is called at each loop. Can the optimizer fix this? The simplest way for it is to inline everything, then it could notice that calculating strlen only once is safe. In C with GCC extensions, one could annotate strlen as pure, and use functions which take s as a const parameter (but I'm unsure if it actually works). In Python (and even in Java), anything such should work without annotations. Of course, one can't rely on this quadratic-linear optimization unless it's guaranteed to work (like tail call elimination), so I wouldn't do it in this case; this point relates to the wider issue of unreliable optimizations and "sufficiently smart compilers", better discussed at http://prog21.dadgum.com/40.html (not mine). -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/
[crossposting to jython-dev] Because of some conversations I had with Maciej (mostly at Folsom Coffee in Boulder :) ), we are considering adding support for the CPython C-Extension API for Jython, modeling what has already been done in PyPy and IronPython. Although I think it may make a lot of sense to port NumPy to Java, and have argued for it in the past, being pragmatic suggests it's better to work with the tide of NumPy/Cython than against it. Also, this can bring in a large swath of existing libraries to work with Jython, including those coded against SWIG, at the cost that it will not run under most security manager policies. I think that's a reasonable tradeoff. Similar concerns that Maciej raises apply to Jython. No Java JIT will inline such native code, marshaling from the Java domain to the native one will be expensive, etc. But this is (mostly) true of Jython today, from Python code to Java (although invokedynamic will at least reduce some of those costs). But users can still take advantage of Java to achieve much better performance from Jython, if they are careful about structuring the execution of their code. At the end of the day, Jython to C code, including that produced by Cython should see a similar performance profile to CPython to C code, as long as they don't hammer the INCREF/DECREF *functions*. (JRuby is implementing something similar, and we probably can borrow their "refcounting" support.) But of course that's exactly what one needs to avoid to write performant extension code anyway in CPython, at least if it's to be multithreaded. One interesting part of this discussion is whether we can support lock eliding. This is one part of JIT inlining that you don't want to give up for multithreaded performance. Rather than having C code callback into Java to release the GIL (which is only global for such C code!), it would be better to have a marker on the C code that allows for immediate release, or perhaps some other inlinable Java stub. I could imagine this could be readily supported by Cython (and perhaps already is). Lastly, I want to emphasize again that if/when Jython adds support for the C extension API, the "GIL" and "refcounting" support will only be for such C code! We like our concurrency support and we are not giving it up :) - Jim On Thu, Aug 12, 2010 at 3:25 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
On Thu, Aug 12, 2010 at 8:49 AM, Stefan Behnel wrote:
there has recently been a move towards a .NET/IronPython port of Cython, mostly driven by the need for a fast NumPy port. During the related discussion, the question came up how much it would take to let Cython also target other runtimes, including PyPy.
Given that PyPy already has a CPython C-API compatibility layer, I doubt that it would be hard to enable that. With my limited knowledge about
internals of that layer, I guess the question thus becomes: is there anything Cython could do to the C code it generates that would make the Cython generated extension modules run faster/better/safer on PyPy than they would currently? I never tried to make a Cython module actually run on PyPy (simply because I don't use PyPy), but I have my doubts that they'd run perfectly out of the box. While generally portable, I'm pretty sure
Maciej Fijalkowski, 12.08.2010 10:05: the the
C code relies on some specific internals of CPython that PyPy can't easily (or efficiently) provide.
CPython extension compatibility layer is in alpha at best. I heavily doubt that anything would run out of the box. However, this is a cpython compatiblity layer anyway, it's not meant to be used as a long term solutions. First of all it's inneficient (and unclear if will ever be)
If you only use it to call into non-trivial Cython code (e.g. some heavy calculations on NumPy tables), the call overhead should be mostly negligible, maybe even close to that in CPython. You could even provide some kind of fast-path to 'cpdef' functions (i.e. functions that are callable from both C and Python) and 'api' functions (which are currently exported at the module API level using the PyCapsule mechanism). That would reduce the call overhead to that of a C call.
Then, a lot of Cython code doesn't do much ref-counting and the like but simply runs in plain C. So, often enough, there won't be that much overhead involved in the code itself either, especially in tight loops where users prune away all CPython interaction anyway.
but it's also unjitable. This means that to JIT, cpython extension is like a black box which should not be touched.
Well, unless both sides learn about each other, that is. It won't necessarily impact the JIT, but then again, a JIT usually won't have a noticeable impact on the performance of Cython code anyway.
Also, several concepts, like refcounting are completely alien to pypy and emulated.
Sure. That's why I asked if there is anything that Cython can help to improve here. For example, the code it generates for INCREF/DECREF operations is not only configurable at the C preprocessor level.
For example for numpy, I think a rewrite is necessary to make it fast (and as experiments have shown, it's possible to make it really fast), so I would not worry about using cython for speeding things up.
This isn't only about making things fast when being rewritten. This is also about accessing and reusing existing code in a new environment. Cython is becoming increasingly popular in the numerics community, and a lot of Cython code is being written as we speak, not only in the SciPy/NumPy environment. People even find it attractive enough to start rewriting their CPython extension modules (most often library wrappers) from C in Cython, both for performance and TCO reasons.
There is another usecase for using cython for providing access to C libraries. This is a bit harder question and I don't have a good answer for that, but maybe cpython compatibility layer would be good enough in this case? I can't see how Cython can produce a "native" C code instead of CPython C code without some major effort.
Native (standalone) C code isn't the goal, just something that adapts well to what PyPy can provide as a CPython compatibility layer. If Cython modules work across independent Python implementations, that would be the most simple way by far to make lots of them available cross-platform, thus making it a lot simpler to switch between different implementations.
Stefan
_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
On Aug 12, 2010, at 3:49 AM, Stefan Behnel wrote:
Hi,
there has recently been a move towards a .NET/IronPython port of Cython, mostly driven by the need for a fast NumPy port. During the related discussion, the question came up how much it would take to let Cython also target other runtimes, including PyPy.
Given that PyPy already has a CPython C-API compatibility layer, I doubt that it would be hard to enable that. With my limited knowledge about the internals of that layer, I guess the question thus becomes: is there anything Cython could do to the C code it generates that would make the Cython generated extension modules run faster/better/safer on PyPy than they would currently? I never tried to make a Cython module actually run on PyPy (simply because I don't use PyPy), but I have my doubts that they'd run perfectly out of the box. While generally portable, I'm pretty sure the C code relies on some specific internals of CPython that PyPy can't easily (or efficiently) provide.
A possible solution I think would be to do an oo backend for cython. That could be made to generate C# or RPython code. The problem remains that pypy still doesn't have separate compilation so you cannot make a external module for the pypy interpreter after it is translated. So it is hard, maybe harder than anyone on cython would like, but I still think it is a good solution. (Unless I'm mistaken in any of my assumptions, and then it is a terrible solution :) -- Leonardo Santagada santagada at gmail.com
participants (5)
-
Jim Baker
-
Leonardo Santagada
-
Maciej Fijalkowski
-
Paolo Giarrusso
-
Stefan Behnel