Question about extension support

Hi all, I've been trying to learn about how PyPy supports (unmodified) Python extensions, and one thing I've heard is that it's much slower than cPython, and/or uses more memory. I tried finding some documentation about why, and all I could find is this, from 2010: https://bitbucket.org/pypy/compatibility/wiki/c-api Sorry if this should be obvious, but is there more up-to-date information about this stuff? And secondly, assuming the info that I linked to is still valid, is there a reason you guys settled on this method of bridging the refcount/tracing divide, as opposed to other possibilities (can you pin the objects in the GC)? I'm curious, since I've heard a number of people mention that extension modules are the primary reason that PyPy is slower than cPython for their code; definitely an improvement over "PyPy doesn't run my code at all", but it's made me curious about whether or not / why it has to be that way. kmod

On Tue, 2014-03-25 at 16:19 -0700, Kevin Modzelewski wrote:
In my opinion, it all depends on how do you use CPyExt and what your extension modules are for. There are two scenarios here (or combinations thereof) that I think cover most of the use cases: 1) You use C extensions to make it faster. 2) You use C extensions to steer external processes. Ideally with PyPy you should be able to drop (1) altogether and write nice Python code that JIT will be able to optimize sometimes even better than hand-written C code, so here the answer would be "don't use extensions". Now, if as a part of (2) you are doing some lengthy processing entirely outside PyPy, this might still just as fast as with CPython with CPyExt, but if the calls to your foreign functions are short and/or you are transferring a lot of data C <-> PyPy, then there you go... Personally, I've been using CPyExt and I'm very happy about it, because the function calls take a long time, and whatever happens outside doesn't have much to do with objects in PyPy land. However, if my requirements were different, I would have rather re-written everything using cffi, from what I understood it can deliver comparable performance to cPython, and also it works both for PyPy and cPython, not just PyPy... -- Sincerely yours, Yury V. Zaytsev

Hi all, thanks for the responses, but I guess I should have been more explicit -- I'm curious about *why* PyPy is slow on existing extension modules and why people are being steered away from them. I completely support the push to move away from CPython extension modules, but I'm not sure it's reasonable to expect that programmers will rewrite all the extension modules they use. Put another way, I understand how having a JIT-understandable cffi module will be faster on PyPy than an extension module, but what I don't quite understand is why CPython extension modules have to be slower on PyPy than they are on CPython. I'm not saying that extension modules should be sped up by PyPy, but I'm curious why they have a reputation for being slower. On Wed, Mar 26, 2014 at 1:31 AM, Yury V. Zaytsev <yury@shurup.com> wrote:

On Wed, Mar 26, 2014, at 13:47, Kevin Modzelewski wrote:
There are several reasons. Two of the most important are 1) PyPy's internal representation of objects is different from CPython's, so a conversion cost must be payed every time objects pass between pure Python and C. Unlike CPython, extensions with PyPy can't poke around directly in data structures. Macros like PyList_SET_ITEM have to become function calls. 2) Bridging the gap between PyPy's GC and CPython's ref counting requires a lot of bookkeeping.

On Wed, Mar 26, 2014 at 1:52 PM, Benjamin Peterson <benjamin@python.org>wrote:
Hmm interesting... I'm not sure I follow, though, why the calling PyList_SET_ITEM on a PyPy list can't know about the PyPy object representation. Again, I understand how it's not necessarily going to be as fast as pure-python code, but I don't understand why PyList_SET_ITEM on PyPy needs to be slower than on CPython. Is it because PyPy uses more complicated internal representations, expecting the overhead to be elided by the JIT? Also, I'm assuming that CPyExt gets to do a recompilation of the extension module; I could definitely understand how there could be significant overhead if this was being done as an ABI compatibility layer. 2) Bridging the gap between PyPy's GC and CPython's ref counting requires a lot of bookkeeping.
kmod

On Wed, Mar 26, 2014, at 21:17, Kevin Modzelewski wrote:
Let's continue with the list example. pypy lists use an array as the underlying data structure like CPython, but the similarity stops there. You can't just have random C code putting things in pypy lists. The internal representation of the list might be unwrapped integers, not points to int objects like CPython lists. There also needs to be GC barriers. The larger picture is that building a robust CPython compatibility layer is difficult and error-prone compared to the solution of rewriting C extensions in Python (possibly with cffi).
Yes
Conservative GCs are evil and slow. :) I don't know what you mean by the "extension module's allocator". That's a fairly global thing.

On Wed, Mar 26, 2014 at 9:32 PM, Benjamin Peterson <benjamin@python.org>wrote:
Using that logic, I would counter that building a JIT for a dynamic language is difficult and error-prone compared to rewriting your dynamic language programs in a faster language :) The benefit to supporting it in your runtime is 1) you only do the work once, and 2) you get to support existing code out there. I'm writing not from the standpoint of saying "I have an extension module and I want it to run quickly", but rather "what do you guys think about the (presumed) situation of extension modules being a key blocker of PyPy adoption". While I'd love the world to migrate to a better solution overnight, I don't think that's realistic -- just look at the state of Python 3, which has a much larger constituency pushing much harder for it, and presumably has lower switching costs than rewriting C extensions in Python.
I'm assuming that you can hook out malloc and mmap to be calls to the GC allocator; I've seen other projects do this, though I don't know how robust it is.

On Wed, Mar 26, 2014, at 21:51, Kevin Modzelewski wrote:
I don't want to argue that an amazing fast CPython API compatibility isn't possible, but current experience suggests that creating it will be painful. It's hard to get excited about building compatibility layers when there are shiny JITs to be made.
Yes, but you get to use PyPy and get super fast Python code, whereas you code gets no faster by porting to Python 3. Plus you get rid of C! The incentives are a bit better.
That's the easy part. The hard part is keeping your precise GC informed of native C doing arbitrary things.

Hi all, I'd like to point Kevin to the thread "cpyext performance" of July-August 2012, in which we did some explanation of what is slow about cpyext and could potentially be improved. As others have mentioned here again, we can't reasonably hope to get them to the same speed as on CPython, but "someone" could at least work and lower the difference. (Nobody did so far.) https://mail.python.org/pipermail/pypy-dev/2012-July/010263.html A small note about PyList_SET_ITEM(): this is impossible to keep as a macro or even to write as C code. There are some practical reasons as mentioned here, but the most fundamental reason imho is that doing so means throwing the flexibility of PyPy out of the window. I'm talking here about adding new implementations of list objects (we already have several ones, e.g. for lists-of-integers or for range()), about changing the GC, and so on. In other words: of course it is possible (if hard) to write the complete logic of PyList_SET_ITEM as C code, and even as a C macro. The point is that if we did that, then we'd give up on the possibility of ever changing any of these other aspects, or at least require painful adaptation every time we want to change them. (And yes, we do change them from time to time. For example, the STM branch we're working on has a different GC, and we had to look inside cpyext exactly zero times to make it work.) A bientôt, Armin.

On Tue, 2014-03-25 at 16:19 -0700, Kevin Modzelewski wrote:
In my opinion, it all depends on how do you use CPyExt and what your extension modules are for. There are two scenarios here (or combinations thereof) that I think cover most of the use cases: 1) You use C extensions to make it faster. 2) You use C extensions to steer external processes. Ideally with PyPy you should be able to drop (1) altogether and write nice Python code that JIT will be able to optimize sometimes even better than hand-written C code, so here the answer would be "don't use extensions". Now, if as a part of (2) you are doing some lengthy processing entirely outside PyPy, this might still just as fast as with CPython with CPyExt, but if the calls to your foreign functions are short and/or you are transferring a lot of data C <-> PyPy, then there you go... Personally, I've been using CPyExt and I'm very happy about it, because the function calls take a long time, and whatever happens outside doesn't have much to do with objects in PyPy land. However, if my requirements were different, I would have rather re-written everything using cffi, from what I understood it can deliver comparable performance to cPython, and also it works both for PyPy and cPython, not just PyPy... -- Sincerely yours, Yury V. Zaytsev

Hi all, thanks for the responses, but I guess I should have been more explicit -- I'm curious about *why* PyPy is slow on existing extension modules and why people are being steered away from them. I completely support the push to move away from CPython extension modules, but I'm not sure it's reasonable to expect that programmers will rewrite all the extension modules they use. Put another way, I understand how having a JIT-understandable cffi module will be faster on PyPy than an extension module, but what I don't quite understand is why CPython extension modules have to be slower on PyPy than they are on CPython. I'm not saying that extension modules should be sped up by PyPy, but I'm curious why they have a reputation for being slower. On Wed, Mar 26, 2014 at 1:31 AM, Yury V. Zaytsev <yury@shurup.com> wrote:

On Wed, Mar 26, 2014, at 13:47, Kevin Modzelewski wrote:
There are several reasons. Two of the most important are 1) PyPy's internal representation of objects is different from CPython's, so a conversion cost must be payed every time objects pass between pure Python and C. Unlike CPython, extensions with PyPy can't poke around directly in data structures. Macros like PyList_SET_ITEM have to become function calls. 2) Bridging the gap between PyPy's GC and CPython's ref counting requires a lot of bookkeeping.

On Wed, Mar 26, 2014 at 1:52 PM, Benjamin Peterson <benjamin@python.org>wrote:
Hmm interesting... I'm not sure I follow, though, why the calling PyList_SET_ITEM on a PyPy list can't know about the PyPy object representation. Again, I understand how it's not necessarily going to be as fast as pure-python code, but I don't understand why PyList_SET_ITEM on PyPy needs to be slower than on CPython. Is it because PyPy uses more complicated internal representations, expecting the overhead to be elided by the JIT? Also, I'm assuming that CPyExt gets to do a recompilation of the extension module; I could definitely understand how there could be significant overhead if this was being done as an ABI compatibility layer. 2) Bridging the gap between PyPy's GC and CPython's ref counting requires a lot of bookkeeping.
kmod

On Wed, Mar 26, 2014, at 21:17, Kevin Modzelewski wrote:
Let's continue with the list example. pypy lists use an array as the underlying data structure like CPython, but the similarity stops there. You can't just have random C code putting things in pypy lists. The internal representation of the list might be unwrapped integers, not points to int objects like CPython lists. There also needs to be GC barriers. The larger picture is that building a robust CPython compatibility layer is difficult and error-prone compared to the solution of rewriting C extensions in Python (possibly with cffi).
Yes
Conservative GCs are evil and slow. :) I don't know what you mean by the "extension module's allocator". That's a fairly global thing.

On Wed, Mar 26, 2014 at 9:32 PM, Benjamin Peterson <benjamin@python.org>wrote:
Using that logic, I would counter that building a JIT for a dynamic language is difficult and error-prone compared to rewriting your dynamic language programs in a faster language :) The benefit to supporting it in your runtime is 1) you only do the work once, and 2) you get to support existing code out there. I'm writing not from the standpoint of saying "I have an extension module and I want it to run quickly", but rather "what do you guys think about the (presumed) situation of extension modules being a key blocker of PyPy adoption". While I'd love the world to migrate to a better solution overnight, I don't think that's realistic -- just look at the state of Python 3, which has a much larger constituency pushing much harder for it, and presumably has lower switching costs than rewriting C extensions in Python.
I'm assuming that you can hook out malloc and mmap to be calls to the GC allocator; I've seen other projects do this, though I don't know how robust it is.

On Wed, Mar 26, 2014, at 21:51, Kevin Modzelewski wrote:
I don't want to argue that an amazing fast CPython API compatibility isn't possible, but current experience suggests that creating it will be painful. It's hard to get excited about building compatibility layers when there are shiny JITs to be made.
Yes, but you get to use PyPy and get super fast Python code, whereas you code gets no faster by porting to Python 3. Plus you get rid of C! The incentives are a bit better.
That's the easy part. The hard part is keeping your precise GC informed of native C doing arbitrary things.

Hi all, I'd like to point Kevin to the thread "cpyext performance" of July-August 2012, in which we did some explanation of what is slow about cpyext and could potentially be improved. As others have mentioned here again, we can't reasonably hope to get them to the same speed as on CPython, but "someone" could at least work and lower the difference. (Nobody did so far.) https://mail.python.org/pipermail/pypy-dev/2012-July/010263.html A small note about PyList_SET_ITEM(): this is impossible to keep as a macro or even to write as C code. There are some practical reasons as mentioned here, but the most fundamental reason imho is that doing so means throwing the flexibility of PyPy out of the window. I'm talking here about adding new implementations of list objects (we already have several ones, e.g. for lists-of-integers or for range()), about changing the GC, and so on. In other words: of course it is possible (if hard) to write the complete logic of PyList_SET_ITEM as C code, and even as a C macro. The point is that if we did that, then we'd give up on the possibility of ever changing any of these other aspects, or at least require painful adaptation every time we want to change them. (And yes, we do change them from time to time. For example, the STM branch we're working on has a different GC, and we had to look inside cpyext exactly zero times to make it work.) A bientôt, Armin.
participants (4)
-
Armin Rigo
-
Benjamin Peterson
-
Kevin Modzelewski
-
Yury V. Zaytsev