Re: [Python-Dev] Let's change to C API!

31 Jul 2018

      On Tue, 31 Jul 2018 12:51:23 +0200
Victor Stinner  wrote:
...
2018-07-31 8:58 GMT+02:00 Antoine Pitrou :
...
What exactly in the C API made it slow or non-promising?
...
The C API requires that your implementations make almost all the same
design choices that CPython made 25 years ago (C structures, memory
allocators, reference couting, specific GC implementation, GIL, etc.).
Yes, but those choices are not necessarily bad.
I understood that PyPy succeeded to become at least 2x faster than
CPython by stopping to use reference counting internally.
"I understood that"... where did you get it from? :-)
...
I also want to make the debug build usable.
So I think that we should ask what the ABI differences between debug
and non-debug builds are.

AFAIK, the two main ones are Py_TRACE_REFS and Py_REF_DEBUG.  Are there
any others?

Honestly, I don't think Py_TRACE_REFS is useful.  I don't remember
any bug being discovered thanks to it.  Py_REF_DEBUG is much more
useful.  The main ABI issue with Py_REF_DEBUG is not object structure
(it doesn't change object structure), it's when a non-debug extension
steals a reference (or calls a reference-stealing C API function),
because then increments and decrements are unbalanced.
...
I also want to allow OS vendors to provide multiple Python versions
per OS release: *reduce* the maintenance burden, obviously it will
still mean more work. It's a tradeoff depending on the lifetime of
your OS and the pressure of customers to get the newest Python :-) FYI
Red Hat already provide recent development tools on top of RHEL (and
Centos and Fedora) because customers are asking for that. We don't
work for free :-)
OS vendors seem to be doing a fine job AFAICT.  And if I want a recent
Python I just download Miniconda/Anaconda.
...
I also want to see more alternatives implementations of Python! I
would like to see RustPython succeed!
As long as RustPython gets 10 commits a year, it has no chance of being
a functional Python implementation, let alone a successful one.  AFAICS
it's just a toy project.
...
...
and one where I think Stefan is right that we
should push people towards Cython and alternatives, rather than direct
use of the C API (which people often fail to use correctly, in my
experience).
Don't get me wrong: my intent is not to replace Cython. Even if PyPy
is pushing hard cffi, many C extensions still use the C API.
cffi is a ctypes replacement.  It's nice when you want to bind with
foreign C code, not if you want tight interaction with CPython objects.
...
Maybe if the C API becomes more annoying and require developers to
adapt their old code base for the "new C API", some of them will
reconsider to use Cython, cffi or something else :-D
I think you don't realize that the C API is *already* annoying.  People
started with it mostly because there wasn't a better alternative at the
time.  You don't need to make it more annoying than it already is ;-)

Replacing existing C extensions with something else is entirely a
developer time/effort problem, not an attractivity problem.  And I'm
not sure that porting a C extension to a new C API is more reasonable
than porting to Cython entirely.
...
Do you think that it's wrong to promise that a smaller C API without
implementation details will allow to more easily *experiment*
optimizations?
I don't think it's wrong.  Though as long as CPython itself uses the
internal C API, you'll still have a *lot* of code to change before you
can even launch a functional interpreter and standard library...

It's just that I disagree that removing the C API will make CPython 2x
faster.

Actually, important modern optimizations for dynamic languages (such as
inlining, type specialization, inline caches, object unboxing) don't
seem to depend on the C API at all.
...
...
...
I have to confess that helping Larry is part of my overall plan.
Which is why I'd like to see Larry chime in here.
I already talked a little bit with Larry about my plan, but he wasn't
sure that my plan is enough to be able to stop reference counting
internally and move to a different garbage collector.  I'm only sure
that it's possible to keep using reference counting for the C API,
since there are solutions for that (ex: maintain a hash table
PyObject* => reference count).
Theoretically possible, but the cost of reference counting will go
through the roof if you start using a hash table.
...
Honestly, right now, I'm only convinvced of two things:
* Larry implementation is very complex and so I doubt that he is going
to succeed. I'm talking about solutions to maintain optimize reference
counting in multithreaded applications. Like his idea of "logs" of
reference counters.
Well, you know, *any* solution is going to be very complex.  Switching
to a full GC for a runtime (CPython) which can allocate hundreds of
thousands of objects per second will require a lot of optimization work
as well.
...
* We have to change the C API: it causes troubles to *everybody*.
Nobody spoke up because changing the C API is a giant project and it
breaks the backward compatibility. But I'm not sure that all victims
of the C API are aware that their issues are caused by the design of
the current C API.
I fully agree that the C API is not very nice to play with.  The
diversity of calling / error return conventions is one annoyance.
Borrowed references and reference stealing is another.  Getting
reference counting right on all code paths is often delicate.

So I'm all for sanitizing the C API, and slowly deprecating old
patterns.  And I think we should push people towards Cython for most
current uses of the C API.

Regards

Antoine.