Let's change to C API!

Hi, I just sent an email to the capi-sig mailing list. Since this mailing list was idle for months, I copy my email here to get a wider audience. But if possible, I would prefer that you join me on capi-sig to reply ;-) -- Hi, Last year, I gave a talk at the Language Summit (during Pycon) to explain that CPython should become 2x faster to remain competitive. IMHO all attempts to optimize Python (CPython forks) have failed because they have been blocked by the C API which implies strict constraints. I started to write a proposal to change the C API to hide implementation details, to prepare CPython for future changes. It allows to experimental optimization ideas without loosing support for C extensions. C extensions are a large part of the Python success. They are also the reason why PyPy didn't replace CPython yet. PyPy cpyext remains slower than CPython because PyPy has to mimick CPython which adds a significant overhead (even if PyPy developers are working *hard* to optimize it). I created a new to discuss how to introduce backward incompatible changes in the C API without breaking all C extensions: http://pythoncapi.readthedocs.io/ The source can be found at: https://github.com/vstinner/pythoncapi/ I would like to create a team of people who want to work on this project: CPython, PyPy, Cython and anyone who depends on the C API. Contact me in private if you want to be added to the GitHub project. I propose to discuss on the capi-sig mailing list since I would like to involve people from various projects, and I don't want to bother you with the high traffic of python-dev. Victor PS: I added some people as BCC ;-)

Hi Victor, On Sun, 29 Jul 2018 21:47:51 +0200 Victor Stinner <vstinner@redhat.com> wrote:
Well, that's your opinion, but did you prove it? Which CPython forks did you examine that failed because of the C API? The one area I know of where the C API is a strong barrier to improvement is removing the GIL, and I'd rather let Larry speak about it. Regards Antoine.

I discussed with many Python developers who agree with me that the C API blocked CPython forks. For example, I heard that Pyston was very fast and very promising before starting to support the C API. The C API requires that your implementations make almost all the same design choices that CPython made 25 years ago (C structures, memory allocators, reference couting, specific GC implementation, GIL, etc.). More efficient technonogies appeared in the meanwhile. Multiple PyPy developers told me that cpyext remains a blocker issue to use PyPy. I am not sure how I am supposed to "prove" these facts. Oh, by the way, I will not promise anything about any potential performance gain. When I write "2x faster", I mean that our current approach for optimization failed to make Python 2x faster over the last 10 years. Python 3 is more or less as fast, or a little bit faster, than Python 2. But Python 2 must not be used as an example of performance. People hesitate between Go, Javascript and Python. And Python is not the winner in term of performance.
IMHO Gilectomy is going to fail without our help to change the C API. Again, I don't want to promise anything here. Removing reference counting inside CPython is a giant project. But at least, I know that borrowed references are very complex to support if CPython doesn't use reference counting. I heard that PyPy has issues to implement borrowed references. If we succeed to remove them, PyPy should benefit directly of that work. Note: PyPy will still have to support borrowed references for C extensions usong the old C API. But I expect that PyPy will be more reliable, maybe even faster, to run C extensions using the new C API without reference counting. I have to confess that helping Larry is part of my overall plan. But I dislke making promise that I cannot do, and I dislike working on giant multiyear Python roject. My hope is to have a working new C API next year which will hide some implementation details, but not all of them. I want to work incrementally using popular C extensions in the feedback loop. Building a new C API is useless if nobody can use it. But it will take time to adjust the backward compatibility cursor. Victor

On Tue, 31 Jul 2018 02:29:42 +0200 Victor Stinner <vstinner@redhat.com> wrote:
What exactly in the C API made it slow or non-promising?
Yes, but those choices are not necessarily bad.
Multiple PyPy developers told me that cpyext remains a blocker issue to use PyPy.
Probably, but we're talking about speeding up CPython here, right? If we're talking about making more C extensions PyPy-compatible, that's a different discussion, and one where I think Stefan is right that we should push people towards Cython and alternatives, rather than direct use of the C API (which people often fail to use correctly, in my experience). But the C API is still useful for specialized uses, *including* for development tools such as Cython.
I agree about the overall diagnosis. I just disagree that changing the C API will open up easy optimization opportunities. Actually I'd like to see a list of optimizations that you think are held up by the C API.
I have to confess that helping Larry is part of my overall plan.
Which is why I'd like to see Larry chime in here. Regards Antoine.

2018-07-31 8:58 GMT+02:00 Antoine Pitrou <solipsis@pitrou.net>:
I understood that PyPy succeeded to become at least 2x faster than CPython by stopping to use reference counting internally.
My project has different goals. I would prefer to not make any promise about speed. So speed is not my first motivation, or at least not the only one :-) I also want to make the debug build usable. I also want to allow OS vendors to provide multiple Python versions per OS release: *reduce* the maintenance burden, obviously it will still mean more work. It's a tradeoff depending on the lifetime of your OS and the pressure of customers to get the newest Python :-) FYI Red Hat already provide recent development tools on top of RHEL (and Centos and Fedora) because customers are asking for that. We don't work for free :-) I also want to see more alternatives implementations of Python! I would like to see RustPython succeed! See the latest version of https://pythoncapi.readthedocs.io/ for the full rationale.
If we're talking about making more C extensions PyPy-compatible, that's a different discussion,
For pratical reasons, IMHO it makes sense to put everything in the same "new C API" bag. Obviously, I propose to make many changes, and some of them can be more difficult to implement. My proposal contains many open questions and is made of multiple milestones, with a strong requirement on backward compatibility.
Don't get me wrong: my intent is not to replace Cython. Even if PyPy is pushing hard cffi, many C extensions still use the C API. Maybe if the C API becomes more annoying and require developers to adapt their old code base for the "new C API", some of them will reconsider to use Cython, cffi or something else :-D But backward compatibility is a big part of my plan, and in fact, I expect that porting most C extensions to the new C API will be between "free" and "cheap". Obviously, it depends on how much changes we put in the "new C API" :-) I would like to work incrementally.
But the C API is still useful for specialized uses, *including* for development tools such as Cython.
It seems like http://pythoncapi.readthedocs.io/ didn't explain well my intent. I updated my doc to make it very clear that the "old C API" remains available *on purpose*. The main question is if you will be able to use Cython with the "old C API" on a new "experimental runtime", or if Cython will be stuck at the "regular runtime". https://pythoncapi.readthedocs.io/runtimes.html It's just that for the long term (end of my roadmap), you will have to opt-in for the old C API.
I agree about the overall diagnosis. I just disagree that changing the C API will open up easy optimization opportunities.
Ok, please help me to rephrase the documentation to not make any promise :-) Currently, I wrote: """ Optimization ideas Once the new C API will succeed to hide implementation details, it becomes possible to experiment radical changes in CPython to implement new optimizations. See Experimental runtime. """ https://pythoncapi.readthedocs.io/optimization_ideas.html In my early plan, I wrote "faster runtime". I replaced it with "experimental runtime" :-) Do you think that it's wrong to promise that a smaller C API without implementation details will allow to more easily *experiment* optimizations?
Actually I'd like to see a list of optimizations that you think are held up by the C API.
Hum, let me use the "Tagged pointers" example. Most C functions use "PyObject*" as an opaque C type. Good. But technically, since we give access to fields of C structures, like PyObject.ob_refcnt or PyListObject.ob_item, C extensions currently dereference directly pointers. I'm not convinced that tagged pointers will make CPython way faster. I'm just saying that the C API prevents you to even experiment such change to measure the impact on performance. https://pythoncapi.readthedocs.io/optimization_ideas.html#tagged-pointers-do... For the "Copy-on-Write" idea, the issue is that many macros access directly fields of C structures and so at the machine code, the ABI uses a fixed offset in memory to read data, whereas my plan is to allow each runtime to use a different memory layout, like putting Py_GC elsewhere (or even remove it!!!) and/or put ob_refcnt elsewhere. https://pythoncapi.readthedocs.io/optimization_ideas.html#copy-on-write-cow-...
I have to confess that helping Larry is part of my overall plan.
Which is why I'd like to see Larry chime in here.
I already talked a little bit with Larry about my plan, but he wasn't sure that my plan is enough to be able to stop reference counting internally and move to a different garbage collector. I'm only sure that it's possible to keep using reference counting for the C API, since there are solutions for that (ex: maintain a hash table PyObject* => reference count). Honestly, right now, I'm only convinvced of two things: * Larry implementation is very complex and so I doubt that he is going to succeed. I'm talking about solutions to maintain optimize reference counting in multithreaded applications. Like his idea of "logs" of reference counters. * We have to change the C API: it causes troubles to *everybody*. Nobody spoke up because changing the C API is a giant project and it breaks the backward compatibility. But I'm not sure that all victims of the C API are aware that their issues are caused by the design of the current C API. Victor

On Tue, 31 Jul 2018 12:51:23 +0200 Victor Stinner <vstinner@redhat.com> wrote:
"I understood that"... where did you get it from? :-)
I also want to make the debug build usable.
So I think that we should ask what the ABI differences between debug and non-debug builds are. AFAIK, the two main ones are Py_TRACE_REFS and Py_REF_DEBUG. Are there any others? Honestly, I don't think Py_TRACE_REFS is useful. I don't remember any bug being discovered thanks to it. Py_REF_DEBUG is much more useful. The main ABI issue with Py_REF_DEBUG is not object structure (it doesn't change object structure), it's when a non-debug extension steals a reference (or calls a reference-stealing C API function), because then increments and decrements are unbalanced.
OS vendors seem to be doing a fine job AFAICT. And if I want a recent Python I just download Miniconda/Anaconda.
I also want to see more alternatives implementations of Python! I would like to see RustPython succeed!
As long as RustPython gets 10 commits a year, it has no chance of being a functional Python implementation, let alone a successful one. AFAICS it's just a toy project.
cffi is a ctypes replacement. It's nice when you want to bind with foreign C code, not if you want tight interaction with CPython objects.
I think you don't realize that the C API is *already* annoying. People started with it mostly because there wasn't a better alternative at the time. You don't need to make it more annoying than it already is ;-) Replacing existing C extensions with something else is entirely a developer time/effort problem, not an attractivity problem. And I'm not sure that porting a C extension to a new C API is more reasonable than porting to Cython entirely.
I don't think it's wrong. Though as long as CPython itself uses the internal C API, you'll still have a *lot* of code to change before you can even launch a functional interpreter and standard library... It's just that I disagree that removing the C API will make CPython 2x faster. Actually, important modern optimizations for dynamic languages (such as inlining, type specialization, inline caches, object unboxing) don't seem to depend on the C API at all.
Theoretically possible, but the cost of reference counting will go through the roof if you start using a hash table.
Well, you know, *any* solution is going to be very complex. Switching to a full GC for a runtime (CPython) which can allocate hundreds of thousands of objects per second will require a lot of optimization work as well.
I fully agree that the C API is not very nice to play with. The diversity of calling / error return conventions is one annoyance. Borrowed references and reference stealing is another. Getting reference counting right on all code paths is often delicate. So I'm all for sanitizing the C API, and slowly deprecating old patterns. And I think we should push people towards Cython for most current uses of the C API. Regards Antoine.

Antoine: would you mind to subscribe to the capi-sig mailing list? As expected, they are many interesting points discussed here, but I would like to move all C API discussions to capi-sig. I only continue on python-dev since you started here (and ignored my request to start discussing my idea on capi-sig :-)). 2018-07-31 13:55 GMT+02:00 Antoine Pitrou <solipsis@pitrou.net>:
I'm quite sure that PyPy developers told me that, but I don't recall who nor when. I don't think that PyPy became 5x faster just because of a single change. But I understand that to be able to implement some optimizations, you first have to remove constraints caused by a design choice like reference counting. For example, PyPy uses different memory allocators depending on the scope and the lifetime of an object. I'm not sure that you can implement such optimization if you are stuck with reference counting.
So I think that we should ask what the ABI differences between debug and non-debug builds are.
Debug build is one use case. Another use case for OS vendors is to compile a C extension once (ex: on Python 3.6) and use it on multiple Python versions (3.7, 3.8, etc.).
AFAIK, the two main ones are Py_TRACE_REFS and Py_REF_DEBUG. Are there any others?
No idea.
About Py_REF_DEBUG:_Py_RefTotal counter is updated at each INCREF/DECREF. _Py_RefTotal is a popular feature of debug build, and I'm not sure how we can update it without replacing Py_INCREF/DECREF macros with function calls. I'm ok to remove/deprecate Py_TRACE_REFS feature if nobody uses it.
OS vendors seem to be doing a fine job AFAICT. And if I want a recent Python I just download Miniconda/Anaconda.
Is it used in production to deploy services? Or is it more used by developers? I never used Anaconda.
cffi is a ctypes replacement. It's nice when you want to bind with foreign C code, not if you want tight interaction with CPython objects.
I have been told that cffi is a different way to do the same thing. Instead of writing C code with the C API glue, only write C code, and then write a cffi binding for it. But I never used Cython nor cffi, so I'm not sure which one is the most appropriate depending on the use case.
Do you think that it's doable to port numpy to Cython? It's made of 255K lines of C code. A major "rewrite" of such large code base is very difficult since people want to push new things in parallel. Or is it maybe possible to do it incrementally?
It's just that I disagree that removing the C API will make CPython 2x faster.
How can we make CPython 2x faster? Why everybody, except of PyPy, failed to do that? Victor

On Tue, 31 Jul 2018 15:34:05 +0200 Victor Stinner <vstinner@redhat.com> wrote:
Well, I responded to your e-mail discussion thread. I see more messages in this thread here than on capi-sig. ;-)
But what does reference counting have to do with memory allocators exactly?
I don't know, but there's no hard reason why you couldn't use it to deploy services (though some people may prefer Docker or other technologies).
Numpy is a bit special as it exposes its own C API, so porting it entirely to Cython would be difficult (how do you expose a C macro in Cython?). Also, internally it has a lot of macro-generated code for specialized loop implementations (metaprogramming in C :-)). I suppose some bits could be (re)written in Cython. Actually, the numpy.random module is already a Cython module.
Because PyPy spent years working full time on a JIT compiler. It's also written in (a dialect of) Python, which helps a lot with experimenting and building abstractions, compared to C or even C++. Regards Antoine.

Hi, On 31 July 2018 at 13:55, Antoine Pitrou <solipsis@pitrou.net> wrote:
These are optimizations typically talked about in papers about dynamic languages in general. In my opinion, in the specific case of CPython, they are all secondary to the following: (1) JIT, (2) GC, (3) object model, (4) multithreading. Currently, the C API only allows Psyco-style JITting (much slower than PyPy). All three other points might not be possible at all without a seriously modified C API. Why? I have no proof, but only circumstantial evidence. Each of (2), (3), (4) has been done in at least one other implementation: PyPy, Jython and IronPython. Each of these implementation has also got its share of troubles with emulating the CPython C API. You can continue to think that the C API has got nothing to do with it. I tend to think the opposite. The continued absence of major performance improvements for either CPython itself or for any alternative Python implementation that *does* support the C API natively is probably proof enough---I think that enough time has passed, by now, to make this argument. A bientôt, Armin.

Hi Armin, On Fri, 10 Aug 2018 19:15:11 +0200 Armin Rigo <armin.rigo@gmail.com> wrote:
Jython and IronPython never got significant manpower AFAIK, so even without being hindered by the C API, chances are they would never have gotten very far. Both do not even seem to have stable releases implementing the Python 3 language... That leaves us with CPython and PyPy, which are only two data points. And there are enough differences, AFAIK, between those two that picking up "supports the C API natively" as the primary factor leading to a performance difference sounds arbitrary. (the major difference being IMHO that PyPy is written in RPython, which opens up possibilities that are not realistic with a C implementation, such as the JIT being automatically able to inspect implementations of core / stdlib primitives; in a CPython-based JIT such as Numba, you have to reimplement all those primitives in a form that's friendly to the JIT compiler) Regards Antoine.

Antoine Pitrou schrieb am 11.08.2018 um 15:19:
IMHO, while it's not clear to what extent the C-API hinders performance improvements or jittability of code in CPython, I think it's fair to assume that it's easier to improve internals when they are internal and not part of a public API. Whether it's worth the effort to design a new C-API, or at least make major changes to it, I cannot say, lacking an actual comparable implementation of such a design that specifically targets better performance. As it stands, extensions can actually make good use of the fact that the C-API treats them (mostly, see e.g. PEPs 575/580) as first class citizens in the CPython ecosystem. So, the status quo is at least a tradeoff. Stefan

Hi Antoine, On 11 August 2018 at 15:19, Antoine Pitrou <solipsis@pitrou.net> wrote:
I included IronPython and Jython because they are also rather complete implementations of (some version of) Python that are actively used in some contexts. During the past 20 years, these two and PyPy are the only generally-useful rather-complete alternate Python implementations, and they each improve on some of the pain points (2) (3) (4) hitting CPython. Neither of them supports the C API efficiently. Whatever you argue, my opinion is that they got where they are by first completely ignoring the C API. Even Pyston did that. About its C API, CPython can continue to prefer the status quo. I tend to think that it's exactly what will occur, so I'm staying away from capi-sig. A bientôt, Armin.

Hi Victor, On Sun, 29 Jul 2018 21:47:51 +0200 Victor Stinner <vstinner@redhat.com> wrote:
Well, that's your opinion, but did you prove it? Which CPython forks did you examine that failed because of the C API? The one area I know of where the C API is a strong barrier to improvement is removing the GIL, and I'd rather let Larry speak about it. Regards Antoine.

I discussed with many Python developers who agree with me that the C API blocked CPython forks. For example, I heard that Pyston was very fast and very promising before starting to support the C API. The C API requires that your implementations make almost all the same design choices that CPython made 25 years ago (C structures, memory allocators, reference couting, specific GC implementation, GIL, etc.). More efficient technonogies appeared in the meanwhile. Multiple PyPy developers told me that cpyext remains a blocker issue to use PyPy. I am not sure how I am supposed to "prove" these facts. Oh, by the way, I will not promise anything about any potential performance gain. When I write "2x faster", I mean that our current approach for optimization failed to make Python 2x faster over the last 10 years. Python 3 is more or less as fast, or a little bit faster, than Python 2. But Python 2 must not be used as an example of performance. People hesitate between Go, Javascript and Python. And Python is not the winner in term of performance.
IMHO Gilectomy is going to fail without our help to change the C API. Again, I don't want to promise anything here. Removing reference counting inside CPython is a giant project. But at least, I know that borrowed references are very complex to support if CPython doesn't use reference counting. I heard that PyPy has issues to implement borrowed references. If we succeed to remove them, PyPy should benefit directly of that work. Note: PyPy will still have to support borrowed references for C extensions usong the old C API. But I expect that PyPy will be more reliable, maybe even faster, to run C extensions using the new C API without reference counting. I have to confess that helping Larry is part of my overall plan. But I dislke making promise that I cannot do, and I dislike working on giant multiyear Python roject. My hope is to have a working new C API next year which will hide some implementation details, but not all of them. I want to work incrementally using popular C extensions in the feedback loop. Building a new C API is useless if nobody can use it. But it will take time to adjust the backward compatibility cursor. Victor

On Tue, 31 Jul 2018 02:29:42 +0200 Victor Stinner <vstinner@redhat.com> wrote:
What exactly in the C API made it slow or non-promising?
Yes, but those choices are not necessarily bad.
Multiple PyPy developers told me that cpyext remains a blocker issue to use PyPy.
Probably, but we're talking about speeding up CPython here, right? If we're talking about making more C extensions PyPy-compatible, that's a different discussion, and one where I think Stefan is right that we should push people towards Cython and alternatives, rather than direct use of the C API (which people often fail to use correctly, in my experience). But the C API is still useful for specialized uses, *including* for development tools such as Cython.
I agree about the overall diagnosis. I just disagree that changing the C API will open up easy optimization opportunities. Actually I'd like to see a list of optimizations that you think are held up by the C API.
I have to confess that helping Larry is part of my overall plan.
Which is why I'd like to see Larry chime in here. Regards Antoine.

2018-07-31 8:58 GMT+02:00 Antoine Pitrou <solipsis@pitrou.net>:
I understood that PyPy succeeded to become at least 2x faster than CPython by stopping to use reference counting internally.
My project has different goals. I would prefer to not make any promise about speed. So speed is not my first motivation, or at least not the only one :-) I also want to make the debug build usable. I also want to allow OS vendors to provide multiple Python versions per OS release: *reduce* the maintenance burden, obviously it will still mean more work. It's a tradeoff depending on the lifetime of your OS and the pressure of customers to get the newest Python :-) FYI Red Hat already provide recent development tools on top of RHEL (and Centos and Fedora) because customers are asking for that. We don't work for free :-) I also want to see more alternatives implementations of Python! I would like to see RustPython succeed! See the latest version of https://pythoncapi.readthedocs.io/ for the full rationale.
If we're talking about making more C extensions PyPy-compatible, that's a different discussion,
For pratical reasons, IMHO it makes sense to put everything in the same "new C API" bag. Obviously, I propose to make many changes, and some of them can be more difficult to implement. My proposal contains many open questions and is made of multiple milestones, with a strong requirement on backward compatibility.
Don't get me wrong: my intent is not to replace Cython. Even if PyPy is pushing hard cffi, many C extensions still use the C API. Maybe if the C API becomes more annoying and require developers to adapt their old code base for the "new C API", some of them will reconsider to use Cython, cffi or something else :-D But backward compatibility is a big part of my plan, and in fact, I expect that porting most C extensions to the new C API will be between "free" and "cheap". Obviously, it depends on how much changes we put in the "new C API" :-) I would like to work incrementally.
But the C API is still useful for specialized uses, *including* for development tools such as Cython.
It seems like http://pythoncapi.readthedocs.io/ didn't explain well my intent. I updated my doc to make it very clear that the "old C API" remains available *on purpose*. The main question is if you will be able to use Cython with the "old C API" on a new "experimental runtime", or if Cython will be stuck at the "regular runtime". https://pythoncapi.readthedocs.io/runtimes.html It's just that for the long term (end of my roadmap), you will have to opt-in for the old C API.
I agree about the overall diagnosis. I just disagree that changing the C API will open up easy optimization opportunities.
Ok, please help me to rephrase the documentation to not make any promise :-) Currently, I wrote: """ Optimization ideas Once the new C API will succeed to hide implementation details, it becomes possible to experiment radical changes in CPython to implement new optimizations. See Experimental runtime. """ https://pythoncapi.readthedocs.io/optimization_ideas.html In my early plan, I wrote "faster runtime". I replaced it with "experimental runtime" :-) Do you think that it's wrong to promise that a smaller C API without implementation details will allow to more easily *experiment* optimizations?
Actually I'd like to see a list of optimizations that you think are held up by the C API.
Hum, let me use the "Tagged pointers" example. Most C functions use "PyObject*" as an opaque C type. Good. But technically, since we give access to fields of C structures, like PyObject.ob_refcnt or PyListObject.ob_item, C extensions currently dereference directly pointers. I'm not convinced that tagged pointers will make CPython way faster. I'm just saying that the C API prevents you to even experiment such change to measure the impact on performance. https://pythoncapi.readthedocs.io/optimization_ideas.html#tagged-pointers-do... For the "Copy-on-Write" idea, the issue is that many macros access directly fields of C structures and so at the machine code, the ABI uses a fixed offset in memory to read data, whereas my plan is to allow each runtime to use a different memory layout, like putting Py_GC elsewhere (or even remove it!!!) and/or put ob_refcnt elsewhere. https://pythoncapi.readthedocs.io/optimization_ideas.html#copy-on-write-cow-...
I have to confess that helping Larry is part of my overall plan.
Which is why I'd like to see Larry chime in here.
I already talked a little bit with Larry about my plan, but he wasn't sure that my plan is enough to be able to stop reference counting internally and move to a different garbage collector. I'm only sure that it's possible to keep using reference counting for the C API, since there are solutions for that (ex: maintain a hash table PyObject* => reference count). Honestly, right now, I'm only convinvced of two things: * Larry implementation is very complex and so I doubt that he is going to succeed. I'm talking about solutions to maintain optimize reference counting in multithreaded applications. Like his idea of "logs" of reference counters. * We have to change the C API: it causes troubles to *everybody*. Nobody spoke up because changing the C API is a giant project and it breaks the backward compatibility. But I'm not sure that all victims of the C API are aware that their issues are caused by the design of the current C API. Victor

On Tue, 31 Jul 2018 12:51:23 +0200 Victor Stinner <vstinner@redhat.com> wrote:
"I understood that"... where did you get it from? :-)
I also want to make the debug build usable.
So I think that we should ask what the ABI differences between debug and non-debug builds are. AFAIK, the two main ones are Py_TRACE_REFS and Py_REF_DEBUG. Are there any others? Honestly, I don't think Py_TRACE_REFS is useful. I don't remember any bug being discovered thanks to it. Py_REF_DEBUG is much more useful. The main ABI issue with Py_REF_DEBUG is not object structure (it doesn't change object structure), it's when a non-debug extension steals a reference (or calls a reference-stealing C API function), because then increments and decrements are unbalanced.
OS vendors seem to be doing a fine job AFAICT. And if I want a recent Python I just download Miniconda/Anaconda.
I also want to see more alternatives implementations of Python! I would like to see RustPython succeed!
As long as RustPython gets 10 commits a year, it has no chance of being a functional Python implementation, let alone a successful one. AFAICS it's just a toy project.
cffi is a ctypes replacement. It's nice when you want to bind with foreign C code, not if you want tight interaction with CPython objects.
I think you don't realize that the C API is *already* annoying. People started with it mostly because there wasn't a better alternative at the time. You don't need to make it more annoying than it already is ;-) Replacing existing C extensions with something else is entirely a developer time/effort problem, not an attractivity problem. And I'm not sure that porting a C extension to a new C API is more reasonable than porting to Cython entirely.
I don't think it's wrong. Though as long as CPython itself uses the internal C API, you'll still have a *lot* of code to change before you can even launch a functional interpreter and standard library... It's just that I disagree that removing the C API will make CPython 2x faster. Actually, important modern optimizations for dynamic languages (such as inlining, type specialization, inline caches, object unboxing) don't seem to depend on the C API at all.
Theoretically possible, but the cost of reference counting will go through the roof if you start using a hash table.
Well, you know, *any* solution is going to be very complex. Switching to a full GC for a runtime (CPython) which can allocate hundreds of thousands of objects per second will require a lot of optimization work as well.
I fully agree that the C API is not very nice to play with. The diversity of calling / error return conventions is one annoyance. Borrowed references and reference stealing is another. Getting reference counting right on all code paths is often delicate. So I'm all for sanitizing the C API, and slowly deprecating old patterns. And I think we should push people towards Cython for most current uses of the C API. Regards Antoine.

Antoine: would you mind to subscribe to the capi-sig mailing list? As expected, they are many interesting points discussed here, but I would like to move all C API discussions to capi-sig. I only continue on python-dev since you started here (and ignored my request to start discussing my idea on capi-sig :-)). 2018-07-31 13:55 GMT+02:00 Antoine Pitrou <solipsis@pitrou.net>:
I'm quite sure that PyPy developers told me that, but I don't recall who nor when. I don't think that PyPy became 5x faster just because of a single change. But I understand that to be able to implement some optimizations, you first have to remove constraints caused by a design choice like reference counting. For example, PyPy uses different memory allocators depending on the scope and the lifetime of an object. I'm not sure that you can implement such optimization if you are stuck with reference counting.
So I think that we should ask what the ABI differences between debug and non-debug builds are.
Debug build is one use case. Another use case for OS vendors is to compile a C extension once (ex: on Python 3.6) and use it on multiple Python versions (3.7, 3.8, etc.).
AFAIK, the two main ones are Py_TRACE_REFS and Py_REF_DEBUG. Are there any others?
No idea.
About Py_REF_DEBUG:_Py_RefTotal counter is updated at each INCREF/DECREF. _Py_RefTotal is a popular feature of debug build, and I'm not sure how we can update it without replacing Py_INCREF/DECREF macros with function calls. I'm ok to remove/deprecate Py_TRACE_REFS feature if nobody uses it.
OS vendors seem to be doing a fine job AFAICT. And if I want a recent Python I just download Miniconda/Anaconda.
Is it used in production to deploy services? Or is it more used by developers? I never used Anaconda.
cffi is a ctypes replacement. It's nice when you want to bind with foreign C code, not if you want tight interaction with CPython objects.
I have been told that cffi is a different way to do the same thing. Instead of writing C code with the C API glue, only write C code, and then write a cffi binding for it. But I never used Cython nor cffi, so I'm not sure which one is the most appropriate depending on the use case.
Do you think that it's doable to port numpy to Cython? It's made of 255K lines of C code. A major "rewrite" of such large code base is very difficult since people want to push new things in parallel. Or is it maybe possible to do it incrementally?
It's just that I disagree that removing the C API will make CPython 2x faster.
How can we make CPython 2x faster? Why everybody, except of PyPy, failed to do that? Victor

On Tue, 31 Jul 2018 15:34:05 +0200 Victor Stinner <vstinner@redhat.com> wrote:
Well, I responded to your e-mail discussion thread. I see more messages in this thread here than on capi-sig. ;-)
But what does reference counting have to do with memory allocators exactly?
I don't know, but there's no hard reason why you couldn't use it to deploy services (though some people may prefer Docker or other technologies).
Numpy is a bit special as it exposes its own C API, so porting it entirely to Cython would be difficult (how do you expose a C macro in Cython?). Also, internally it has a lot of macro-generated code for specialized loop implementations (metaprogramming in C :-)). I suppose some bits could be (re)written in Cython. Actually, the numpy.random module is already a Cython module.
Because PyPy spent years working full time on a JIT compiler. It's also written in (a dialect of) Python, which helps a lot with experimenting and building abstractions, compared to C or even C++. Regards Antoine.

Hi, On 31 July 2018 at 13:55, Antoine Pitrou <solipsis@pitrou.net> wrote:
These are optimizations typically talked about in papers about dynamic languages in general. In my opinion, in the specific case of CPython, they are all secondary to the following: (1) JIT, (2) GC, (3) object model, (4) multithreading. Currently, the C API only allows Psyco-style JITting (much slower than PyPy). All three other points might not be possible at all without a seriously modified C API. Why? I have no proof, but only circumstantial evidence. Each of (2), (3), (4) has been done in at least one other implementation: PyPy, Jython and IronPython. Each of these implementation has also got its share of troubles with emulating the CPython C API. You can continue to think that the C API has got nothing to do with it. I tend to think the opposite. The continued absence of major performance improvements for either CPython itself or for any alternative Python implementation that *does* support the C API natively is probably proof enough---I think that enough time has passed, by now, to make this argument. A bientôt, Armin.

Hi Armin, On Fri, 10 Aug 2018 19:15:11 +0200 Armin Rigo <armin.rigo@gmail.com> wrote:
Jython and IronPython never got significant manpower AFAIK, so even without being hindered by the C API, chances are they would never have gotten very far. Both do not even seem to have stable releases implementing the Python 3 language... That leaves us with CPython and PyPy, which are only two data points. And there are enough differences, AFAIK, between those two that picking up "supports the C API natively" as the primary factor leading to a performance difference sounds arbitrary. (the major difference being IMHO that PyPy is written in RPython, which opens up possibilities that are not realistic with a C implementation, such as the JIT being automatically able to inspect implementations of core / stdlib primitives; in a CPython-based JIT such as Numba, you have to reimplement all those primitives in a form that's friendly to the JIT compiler) Regards Antoine.

Antoine Pitrou schrieb am 11.08.2018 um 15:19:
IMHO, while it's not clear to what extent the C-API hinders performance improvements or jittability of code in CPython, I think it's fair to assume that it's easier to improve internals when they are internal and not part of a public API. Whether it's worth the effort to design a new C-API, or at least make major changes to it, I cannot say, lacking an actual comparable implementation of such a design that specifically targets better performance. As it stands, extensions can actually make good use of the fact that the C-API treats them (mostly, see e.g. PEPs 575/580) as first class citizens in the CPython ecosystem. So, the status quo is at least a tradeoff. Stefan

Hi Antoine, On 11 August 2018 at 15:19, Antoine Pitrou <solipsis@pitrou.net> wrote:
I included IronPython and Jython because they are also rather complete implementations of (some version of) Python that are actively used in some contexts. During the past 20 years, these two and PyPy are the only generally-useful rather-complete alternate Python implementations, and they each improve on some of the pain points (2) (3) (4) hitting CPython. Neither of them supports the C API efficiently. Whatever you argue, my opinion is that they got where they are by first completely ignoring the C API. Even Pyston did that. About its C API, CPython can continue to prefer the status quo. I tend to think that it's exactly what will occur, so I'm staying away from capi-sig. A bientôt, Armin.
participants (5)
-
Antoine Pitrou
-
Armin Rigo
-
Brett Cannon
-
Stefan Behnel
-
Victor Stinner