On Tue, 31 Jul 2018 12:51:23 +0200 Victor Stinner firstname.lastname@example.org wrote:
2018-07-31 8:58 GMT+02:00 Antoine Pitrou email@example.com:
What exactly in the C API made it slow or non-promising?
The C API requires that your implementations make almost all the same design choices that CPython made 25 years ago (C structures, memory allocators, reference couting, specific GC implementation, GIL, etc.).
Yes, but those choices are not necessarily bad.
I understood that PyPy succeeded to become at least 2x faster than CPython by stopping to use reference counting internally.
"I understood that"... where did you get it from? :-)
I also want to make the debug build usable.
So I think that we should ask what the ABI differences between debug and non-debug builds are.
AFAIK, the two main ones are Py_TRACE_REFS and Py_REF_DEBUG. Are there any others?
Honestly, I don't think Py_TRACE_REFS is useful. I don't remember any bug being discovered thanks to it. Py_REF_DEBUG is much more useful. The main ABI issue with Py_REF_DEBUG is not object structure (it doesn't change object structure), it's when a non-debug extension steals a reference (or calls a reference-stealing C API function), because then increments and decrements are unbalanced.
I also want to allow OS vendors to provide multiple Python versions per OS release: *reduce* the maintenance burden, obviously it will still mean more work. It's a tradeoff depending on the lifetime of your OS and the pressure of customers to get the newest Python :-) FYI Red Hat already provide recent development tools on top of RHEL (and Centos and Fedora) because customers are asking for that. We don't work for free :-)
OS vendors seem to be doing a fine job AFAICT. And if I want a recent Python I just download Miniconda/Anaconda.
I also want to see more alternatives implementations of Python! I would like to see RustPython succeed!
As long as RustPython gets 10 commits a year, it has no chance of being a functional Python implementation, let alone a successful one. AFAICS it's just a toy project.
and one where I think Stefan is right that we should push people towards Cython and alternatives, rather than direct use of the C API (which people often fail to use correctly, in my experience).
Don't get me wrong: my intent is not to replace Cython. Even if PyPy is pushing hard cffi, many C extensions still use the C API.
cffi is a ctypes replacement. It's nice when you want to bind with foreign C code, not if you want tight interaction with CPython objects.
Maybe if the C API becomes more annoying and require developers to adapt their old code base for the "new C API", some of them will reconsider to use Cython, cffi or something else :-D
I think you don't realize that the C API is *already* annoying. People started with it mostly because there wasn't a better alternative at the time. You don't need to make it more annoying than it already is ;-)
Replacing existing C extensions with something else is entirely a developer time/effort problem, not an attractivity problem. And I'm not sure that porting a C extension to a new C API is more reasonable than porting to Cython entirely.
Do you think that it's wrong to promise that a smaller C API without implementation details will allow to more easily *experiment* optimizations?
I don't think it's wrong. Though as long as CPython itself uses the internal C API, you'll still have a *lot* of code to change before you can even launch a functional interpreter and standard library...
It's just that I disagree that removing the C API will make CPython 2x faster.
Actually, important modern optimizations for dynamic languages (such as inlining, type specialization, inline caches, object unboxing) don't seem to depend on the C API at all.
I have to confess that helping Larry is part of my overall plan.
Which is why I'd like to see Larry chime in here.
I already talked a little bit with Larry about my plan, but he wasn't sure that my plan is enough to be able to stop reference counting internally and move to a different garbage collector. I'm only sure that it's possible to keep using reference counting for the C API, since there are solutions for that (ex: maintain a hash table PyObject* => reference count).
Theoretically possible, but the cost of reference counting will go through the roof if you start using a hash table.
Honestly, right now, I'm only convinvced of two things:
- Larry implementation is very complex and so I doubt that he is going
to succeed. I'm talking about solutions to maintain optimize reference counting in multithreaded applications. Like his idea of "logs" of reference counters.
Well, you know, *any* solution is going to be very complex. Switching to a full GC for a runtime (CPython) which can allocate hundreds of thousands of objects per second will require a lot of optimization work as well.
- We have to change the C API: it causes troubles to *everybody*.
Nobody spoke up because changing the C API is a giant project and it breaks the backward compatibility. But I'm not sure that all victims of the C API are aware that their issues are caused by the design of the current C API.
I fully agree that the C API is not very nice to play with. The diversity of calling / error return conventions is one annoyance. Borrowed references and reference stealing is another. Getting reference counting right on all code paths is often delicate.
So I'm all for sanitizing the C API, and slowly deprecating old patterns. And I think we should push people towards Cython for most current uses of the C API.