[Python-Dev] Re: Slowly bend the C API towards the limited API to get a stable ABI for everyone

Feb. 3, 2022

      Hi Guido,

My "north star", as you say, is the HPy "design" (not the actual HPy
API). I would like to convert PyObject* to opaque handles:
dereferencing a PyObject* pointer would simply fail with a compiler
error.

I'm working bottom-to-top: prepare PyObject and PyVarObject to become
opaque, *and* top-to-bottom: prepare subclasses (structures
"inheriting" from PyObject and PyVarObject) to become opaque like
PyFrameObject.

IMO if PyObject* becomes a handle, the migration to the HPy API should
be much easier.

So far, a large part of the work has been identifying which APIs are
preventing to move the current C API to opaque handles. The second
part has been fixing these APIs one by one, starting with changes
which don't break the C API.

In the last 5 years, I fixed many issues of the C API. Very few
projects were impacted since I avoided incompatible changes on
purpose.

Now I reached the end of my (current) queue with the most
controversial changes: incompatible C API changes. I wrote PEP 670 and
674 to share my overall rationale and communicate on the reasons why I
consider that these changes will make our life easier next years.

I'm well aware that in the short term, the benefits are very limited.
But we cannot jump immediately to opaque PyObject* handles. This work
must be done incrementally.

On Thu, Feb 3, 2022 at 1:40 AM Guido van Rossum <guido@python.org> wrote:
...
...
1) First, my main worry is that we put a high pressure on maintainers
of most important Python dependencies before the next of a new Python
version, because we want them to handle the flow of incompatible C API
changes before the final Python 3.x versions is released, to get them
available when Python 3.x final is released.
Hm, maybe we should reduce the flow. And e.g. reject PEP 674...
We need to find a the right limit when introducing C API to not break
too many C extensions "per Python release". Yes, PEP 674 is an example
of incompatible C API change, but Python 3.11 already got many others
which don't are unrelated to that PEP.

For example, the optimization work done by your Microsoft team broke a
bunch of projects using PyThreadState and PyFrameObject APIs. See the
related What's New in Python 3.11 entries:
https://docs.python.org/dev/whatsnew/3.11.html#id2
(I didn't keep track of which projects are affected by these changes.)

I would like to ensure that it remains possible to optimize Python,
but for that, we need to reduce the friction related to C API changes.
What's the best approach for that remains an open question. I'm
proposing some solutions, we are discussing advantages and drawbacks
;-)
...
...
It annoys core developers who cannot change things in Python without
getting an increasing number of complains about a large number of
broken packages, sometimes with a request to revert.
You are mostly talking about yourself here, right? Since the revert requests were mostly aimed at you. :-)
Latest requests for revert are not about C API changes that I made.

Cython and the C API exception change:
https://mail.python.org/archives/list/python-dev@python.org/thread/RS2C53LDZ...

There are other requests for revert in Python 3.11 related to Python
changes, not to the C API.

So far, I only had to revert 2 changes about my C API work:

* PyType_HasFeature(): the change caused a performance regression on
macOS, sadly Python cannot be built with LTO. With LTO (all platforms
but macOS), my change doesn't affect performances.

* Py_TYPE() / Py_SIZE() change: I reverted my change to have more time
to prepare affected projects. Two years later, I consider that this
preparation work is now done (all affected projects are ready) and so
I submitted the PEP 674. Affected projects (merged and pending fixes):
https://www.python.org/dev/peps/pep-0674/#backwards-compatibility
...
...
Moreover, it became common to ask multiple
changes and multiple releases before a Python final release, since
more incompatible changes are introduced in Python (before the beta1).
Sorry, your grammar confuses me. Who is asking whom to do what here?
Cython is the best example. During the development cycle of a new
Python version, my Fedora team adapts Cython to the next Python and
then request a release to get an official release supporting an alpha
Python version.

A release is not strictly required by us (we can apply downsteam
patches), but it's more convenient for us and for people who cannot
use our work-in-progress "COPR" (repository of RPM packages specific
to Fedora). When we ask a project (using Cython) to merge our pull
request, maintainers want to test it on the current Python alpha
version, and it's not convenient when Cython is unusable.

Between Python 3.x alpha1 and Python 3.x final, there might be
multiple Cython releases to handle each time a bunch of C API
incompatible changes. On Python and Cython sides, so far, there was no
coordination to group incompatible changes, they land between alpha1
and beta1 in an irregular fashion.
...
...
2) Second, as you said, the stable ABI reduces the number of binary
packages which have to be built. Small projects with a little team
(ex: a single person) don't have resources to set up a CI and maintain
it to build all these packages. It's doable, but it isn't free.
Maybe we need to help there. For example IIRC conda-forge will build conda packages -- maybe we should offer a service like that for wheels?
Tooling to automate the creation of wheel binary packages targeting
the stable ABI would help. C extensions likely need changing a few
function calls which are not supported by the limited C API. Someone
has to do this work.

My hope is that the quantify of changes is small (ex: modify 2 or 3
function calls) for small C extensions. I didn't try in practice yet.
...
...
The irony of the situation is that we must break the C API (hiding
structures is technically an incompatible change)... to make the C API
stable. Breaking it now to make it stable later.
The question is whether that will ever be enough. Unless we manage to get rid of the INCREF/DECREF macros completely (from the public C API anyway) we still can't change object layout.
I'm open to ideas if someone has a better plan ;-)

Keeping reference counting for consumers of the C API (C extensions)
doesn't prevent to avoid reference counting inside Python (switch to a
completely different GC implementation). PyPy cpyext is a concrete
example of that.

The nogil fork keeps Py_INCREF()/Py_DECREF() functions, but changes
their implementation. If we have to (ex: if we merge nogil), we can
convert Py_INCREF() / Py_DECREF() static inline functions to regular
functions tomorrow (and then change their implementation) without
breaking the API.

Adding abstractions (getter and setter functions) on PyObject give
more freedom to consider different options to evolve Python tomorrow.
...
Is this worth it? Maybe we should just declare those structs and APIs *unstable* and tell people who use them that they can expect to be broken by each alpha release. As you say, hopefully this doesn't affect most people. Likely it'll affect Cython dramatically but Cython is such a special case that trying to evolve the C API will never satisfy them. We'll have to deal with it separately. (Debuggers are a more serious concern. We may need to provide higher-level APIs for debuggers to do the things they need to do. Mark's PEP 669 should help here.)
IMO we only need to add 5 to 10 functions to cover most use cases
involving PyThreadState and PyFrameObject. The remaining least common
usages can continue to require changes at each Python releases. The
short term goal is only to reduce the number of required changes per
Python releases.

Yes, I'm talking about Cython, debuggers and profilers.

Another example is that Cython currently calls PyCode_New() to create
a fake frame object with a filename and line number. IMO it's the
wrong abstraction level: Python should provide a function to create a
frame with a filename and line number, so the caller doesn't have to
bother about the complex PyCode_New() API and frequent PyCodeObject
changes. (Correct me if this problem has already been solved in
Python.)
...
...
In practice, what I did since Python 3.8 is to introduce a small
number of C API changes per Python versions. (...)
How *do* you count this? Try to compile the top 5000 PyPI packages?
I'm using code search in the source code of top 5000 PyPI packages and
I'm looking at broken packages in Fedora when we update Python. Also,
sometimes people add a comment on an issue to mention that their
project is broken by a change.
...
That might severely undercount a long tail of proprietary extensions.
Right.

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.

[Python-Dev] Re: Slowly bend the C API towards the limited API to get a stable ABI for everyone

Victor Stinner