[Python-Dev] Let's change to C API!

Tue Jul 31 07:55:45 EDT 2018

On Tue, 31 Jul 2018 12:51:23 +0200
Victor Stinner <vstinner at redhat.com> wrote:
> 2018-07-31 8:58 GMT+02:00 Antoine Pitrou <solipsis at pitrou.net>:
> > What exactly in the C API made it slow or non-promising?
> >  
> >> The C API requires that your implementations make almost all the same
> >> design choices that CPython made 25 years ago (C structures, memory
> >> allocators, reference couting, specific GC implementation, GIL, etc.).  
> >
> > Yes, but those choices are not necessarily bad.  
> 
> I understood that PyPy succeeded to become at least 2x faster than
> CPython by stopping to use reference counting internally.

"I understood that"... where did you get it from? :-)

> I also want to make the debug build usable.

So I think that we should ask what the ABI differences between debug
and non-debug builds are.

AFAIK, the two main ones are Py_TRACE_REFS and Py_REF_DEBUG.  Are there
any others?

Honestly, I don't think Py_TRACE_REFS is useful.  I don't remember
any bug being discovered thanks to it.  Py_REF_DEBUG is much more
useful.  The main ABI issue with Py_REF_DEBUG is not object structure
(it doesn't change object structure), it's when a non-debug extension
steals a reference (or calls a reference-stealing C API function),
because then increments and decrements are unbalanced.

> I also want to allow OS vendors to provide multiple Python versions
> per OS release: *reduce* the maintenance burden, obviously it will
> still mean more work. It's a tradeoff depending on the lifetime of
> your OS and the pressure of customers to get the newest Python :-) FYI
> Red Hat already provide recent development tools on top of RHEL (and
> Centos and Fedora) because customers are asking for that. We don't
> work for free :-)

OS vendors seem to be doing a fine job AFAICT.  And if I want a recent
Python I just download Miniconda/Anaconda.

> I also want to see more alternatives implementations of Python! I
> would like to see RustPython succeed!

As long as RustPython gets 10 commits a year, it has no chance of being
a functional Python implementation, let alone a successful one.  AFAICS
it's just a toy project.

> > and one where I think Stefan is right that we
> > should push people towards Cython and alternatives, rather than direct
> > use of the C API (which people often fail to use correctly, in my
> > experience).  
> 
> Don't get me wrong: my intent is not to replace Cython. Even if PyPy
> is pushing hard cffi, many C extensions still use the C API.

cffi is a ctypes replacement.  It's nice when you want to bind with
foreign C code, not if you want tight interaction with CPython objects.

> Maybe if the C API becomes more annoying and require developers to
> adapt their old code base for the "new C API", some of them will
> reconsider to use Cython, cffi or something else :-D

I think you don't realize that the C API is *already* annoying.  People
started with it mostly because there wasn't a better alternative at the
time.  You don't need to make it more annoying than it already is ;-)

Replacing existing C extensions with something else is entirely a
developer time/effort problem, not an attractivity problem.  And I'm
not sure that porting a C extension to a new C API is more reasonable
than porting to Cython entirely.

> Do you think that it's wrong to promise that a smaller C API without
> implementation details will allow to more easily *experiment*
> optimizations?

I don't think it's wrong.  Though as long as CPython itself uses the
internal C API, you'll still have a *lot* of code to change before you
can even launch a functional interpreter and standard library...

It's just that I disagree that removing the C API will make CPython 2x
faster.

Actually, important modern optimizations for dynamic languages (such as
inlining, type specialization, inline caches, object unboxing) don't
seem to depend on the C API at all.

> >> I have to confess that helping Larry is part of my overall plan.  
> >
> > Which is why I'd like to see Larry chime in here.  
> 
> I already talked a little bit with Larry about my plan, but he wasn't
> sure that my plan is enough to be able to stop reference counting
> internally and move to a different garbage collector.  I'm only sure
> that it's possible to keep using reference counting for the C API,
> since there are solutions for that (ex: maintain a hash table
> PyObject* => reference count).

Theoretically possible, but the cost of reference counting will go
through the roof if you start using a hash table.

> Honestly, right now, I'm only convinvced of two things:
> 
> * Larry implementation is very complex and so I doubt that he is going
> to succeed. I'm talking about solutions to maintain optimize reference
> counting in multithreaded applications. Like his idea of "logs" of
> reference counters.

Well, you know, *any* solution is going to be very complex.  Switching
to a full GC for a runtime (CPython) which can allocate hundreds of
thousands of objects per second will require a lot of optimization work
as well.

> * We have to change the C API: it causes troubles to *everybody*.
> Nobody spoke up because changing the C API is a giant project and it
> breaks the backward compatibility. But I'm not sure that all victims
> of the C API are aware that their issues are caused by the design of
> the current C API.

I fully agree that the C API is not very nice to play with.  The
diversity of calling / error return conventions is one annoyance.
Borrowed references and reference stealing is another.  Getting
reference counting right on all code paths is often delicate.

So I'm all for sanitizing the C API, and slowly deprecating old
patterns.  And I think we should push people towards Cython for most
current uses of the C API.

Regards

Antoine.