[Python-Dev] Let's change to C API!

Tue Jul 31 06:51:23 EDT 2018

2018-07-31 8:58 GMT+02:00 Antoine Pitrou <solipsis at pitrou.net>:
> What exactly in the C API made it slow or non-promising?
>
>> The C API requires that your implementations make almost all the same
>> design choices that CPython made 25 years ago (C structures, memory
>> allocators, reference couting, specific GC implementation, GIL, etc.).
>
> Yes, but those choices are not necessarily bad.

I understood that PyPy succeeded to become at least 2x faster than
CPython by stopping to use reference counting internally.

>> Multiple PyPy developers told me that cpyext remains a blocker issue to use
>> PyPy.
>
> Probably, but we're talking about speeding up CPython here, right?

My project has different goals. I would prefer to not make any promise
about speed. So speed is not my first motivation, or at least not the
only one :-)

I also want to make the debug build usable.

I also want to allow OS vendors to provide multiple Python versions
per OS release: *reduce* the maintenance burden, obviously it will
still mean more work. It's a tradeoff depending on the lifetime of
your OS and the pressure of customers to get the newest Python :-) FYI
Red Hat already provide recent development tools on top of RHEL (and
Centos and Fedora) because customers are asking for that. We don't
work for free :-)

I also want to see more alternatives implementations of Python! I
would like to see RustPython succeed!

See the latest version of https://pythoncapi.readthedocs.io/ for the
full rationale.

> If we're talking about making more C extensions PyPy-compatible, that's
> a different discussion,

For pratical reasons, IMHO it makes sense to put everything in the
same "new C API" bag.

Obviously, I propose to make many changes, and some of them can be
more difficult to implement. My proposal contains many open questions
and is made of multiple milestones, with a strong requirement on
backward compatibility.

> and one where I think Stefan is right that we
> should push people towards Cython and alternatives, rather than direct
> use of the C API (which people often fail to use correctly, in my
> experience).

Don't get me wrong: my intent is not to replace Cython. Even if PyPy
is pushing hard cffi, many C extensions still use the C API.

Maybe if the C API becomes more annoying and require developers to
adapt their old code base for the "new C API", some of them will
reconsider to use Cython, cffi or something else :-D

But backward compatibility is a big part of my plan, and in fact, I
expect that porting most C extensions to the new C API will be between
"free" and "cheap". Obviously, it depends on how much changes we put
in the "new C API" :-) I would like to work incrementally.

> But the C API is still useful for specialized uses, *including* for
> development tools such as Cython.

It seems like http://pythoncapi.readthedocs.io/ didn't explain well my
intent. I updated my doc to make it very clear that the "old C API"
remains available *on purpose*. The main question is if you will be
able to use Cython with the "old C API" on a new "experimental
runtime", or if Cython will be stuck at the "regular runtime".

https://pythoncapi.readthedocs.io/runtimes.html

It's just that for the long term (end of my roadmap), you will have to
opt-in for the old C API.

> I agree about the overall diagnosis.  I just disagree that changing the
> C API will open up easy optimization opportunities.

Ok, please help me to rephrase the documentation to not make any promise :-)

Currently, I wrote:

"""
Optimization ideas

Once the new C API will succeed to hide implementation details, it
becomes possible to experiment radical changes in CPython to implement
new optimizations.

See Experimental runtime.
"""

https://pythoncapi.readthedocs.io/optimization_ideas.html

In my early plan, I wrote "faster runtime". I replaced it with
"experimental runtime" :-)

Do you think that it's wrong to promise that a smaller C API without
implementation details will allow to more easily *experiment*
optimizations?

> Actually I'd like to see a list of optimizations that you think are
> held up by the C API.

Hum, let me use the "Tagged pointers" example. Most C functions use
"PyObject*" as an opaque C type. Good. But technically, since we give
access to fields of C structures, like PyObject.ob_refcnt or
PyListObject.ob_item, C extensions currently dereference directly
pointers.

I'm not convinced that tagged pointers will make CPython way faster.
I'm just saying that the C API prevents you to even experiment such
change to measure the impact on performance.

https://pythoncapi.readthedocs.io/optimization_ideas.html#tagged-pointers-doable

For the "Copy-on-Write" idea, the issue is that many macros access
directly fields of C structures and so at the machine code, the ABI
uses a fixed offset in memory to read data, whereas my plan is to
allow each runtime to use a different memory layout, like putting
Py_GC elsewhere (or even remove it!!!) and/or put ob_refcnt elsewhere.

https://pythoncapi.readthedocs.io/optimization_ideas.html#copy-on-write-cow-doable

>> I have to confess that helping Larry is part of my overall plan.
>
> Which is why I'd like to see Larry chime in here.

I already talked a little bit with Larry about my plan, but he wasn't
sure that my plan is enough to be able to stop reference counting
internally and move to a different garbage collector. I'm only sure
that it's possible to keep using reference counting for the C API,
since there are solutions for that (ex: maintain a hash table
PyObject* => reference count).

Honestly, right now, I'm only convinvced of two things:

* Larry implementation is very complex and so I doubt that he is going
to succeed. I'm talking about solutions to maintain optimize reference
counting in multithreaded applications. Like his idea of "logs" of
reference counters.

* We have to change the C API: it causes troubles to *everybody*.
Nobody spoke up because changing the C API is a giant project and it
breaks the backward compatibility. But I'm not sure that all victims
of the C API are aware that their issues are caused by the design of
the current C API.

Victor