
Hi Guido, My "north star", as you say, is the HPy "design" (not the actual HPy API). I would like to convert PyObject* to opaque handles: dereferencing a PyObject* pointer would simply fail with a compiler error. I'm working bottom-to-top: prepare PyObject and PyVarObject to become opaque, *and* top-to-bottom: prepare subclasses (structures "inheriting" from PyObject and PyVarObject) to become opaque like PyFrameObject. IMO if PyObject* becomes a handle, the migration to the HPy API should be much easier. So far, a large part of the work has been identifying which APIs are preventing to move the current C API to opaque handles. The second part has been fixing these APIs one by one, starting with changes which don't break the C API. In the last 5 years, I fixed many issues of the C API. Very few projects were impacted since I avoided incompatible changes on purpose. Now I reached the end of my (current) queue with the most controversial changes: incompatible C API changes. I wrote PEP 670 and 674 to share my overall rationale and communicate on the reasons why I consider that these changes will make our life easier next years. I'm well aware that in the short term, the benefits are very limited. But we cannot jump immediately to opaque PyObject* handles. This work must be done incrementally. On Thu, Feb 3, 2022 at 1:40 AM Guido van Rossum <guido@python.org> wrote:
1) First, my main worry is that we put a high pressure on maintainers of most important Python dependencies before the next of a new Python version, because we want them to handle the flow of incompatible C API changes before the final Python 3.x versions is released, to get them available when Python 3.x final is released.
Hm, maybe we should reduce the flow. And e.g. reject PEP 674...
We need to find a the right limit when introducing C API to not break too many C extensions "per Python release". Yes, PEP 674 is an example of incompatible C API change, but Python 3.11 already got many others which don't are unrelated to that PEP. For example, the optimization work done by your Microsoft team broke a bunch of projects using PyThreadState and PyFrameObject APIs. See the related What's New in Python 3.11 entries: https://docs.python.org/dev/whatsnew/3.11.html#id2 (I didn't keep track of which projects are affected by these changes.) I would like to ensure that it remains possible to optimize Python, but for that, we need to reduce the friction related to C API changes. What's the best approach for that remains an open question. I'm proposing some solutions, we are discussing advantages and drawbacks ;-)
It annoys core developers who cannot change things in Python without getting an increasing number of complains about a large number of broken packages, sometimes with a request to revert.
You are mostly talking about yourself here, right? Since the revert requests were mostly aimed at you. :-)
Latest requests for revert are not about C API changes that I made. Cython and the C API exception change: https://mail.python.org/archives/list/python-dev@python.org/thread/RS2C53LDZ... There are other requests for revert in Python 3.11 related to Python changes, not to the C API. So far, I only had to revert 2 changes about my C API work: * PyType_HasFeature(): the change caused a performance regression on macOS, sadly Python cannot be built with LTO. With LTO (all platforms but macOS), my change doesn't affect performances. * Py_TYPE() / Py_SIZE() change: I reverted my change to have more time to prepare affected projects. Two years later, I consider that this preparation work is now done (all affected projects are ready) and so I submitted the PEP 674. Affected projects (merged and pending fixes): https://www.python.org/dev/peps/pep-0674/#backwards-compatibility
Moreover, it became common to ask multiple changes and multiple releases before a Python final release, since more incompatible changes are introduced in Python (before the beta1).
Sorry, your grammar confuses me. Who is asking whom to do what here?
Cython is the best example. During the development cycle of a new Python version, my Fedora team adapts Cython to the next Python and then request a release to get an official release supporting an alpha Python version. A release is not strictly required by us (we can apply downsteam patches), but it's more convenient for us and for people who cannot use our work-in-progress "COPR" (repository of RPM packages specific to Fedora). When we ask a project (using Cython) to merge our pull request, maintainers want to test it on the current Python alpha version, and it's not convenient when Cython is unusable. Between Python 3.x alpha1 and Python 3.x final, there might be multiple Cython releases to handle each time a bunch of C API incompatible changes. On Python and Cython sides, so far, there was no coordination to group incompatible changes, they land between alpha1 and beta1 in an irregular fashion.
2) Second, as you said, the stable ABI reduces the number of binary packages which have to be built. Small projects with a little team (ex: a single person) don't have resources to set up a CI and maintain it to build all these packages. It's doable, but it isn't free.
Maybe we need to help there. For example IIRC conda-forge will build conda packages -- maybe we should offer a service like that for wheels?
Tooling to automate the creation of wheel binary packages targeting the stable ABI would help. C extensions likely need changing a few function calls which are not supported by the limited C API. Someone has to do this work. My hope is that the quantify of changes is small (ex: modify 2 or 3 function calls) for small C extensions. I didn't try in practice yet.
The irony of the situation is that we must break the C API (hiding structures is technically an incompatible change)... to make the C API stable. Breaking it now to make it stable later.
The question is whether that will ever be enough. Unless we manage to get rid of the INCREF/DECREF macros completely (from the public C API anyway) we still can't change object layout.
I'm open to ideas if someone has a better plan ;-) Keeping reference counting for consumers of the C API (C extensions) doesn't prevent to avoid reference counting inside Python (switch to a completely different GC implementation). PyPy cpyext is a concrete example of that. The nogil fork keeps Py_INCREF()/Py_DECREF() functions, but changes their implementation. If we have to (ex: if we merge nogil), we can convert Py_INCREF() / Py_DECREF() static inline functions to regular functions tomorrow (and then change their implementation) without breaking the API. Adding abstractions (getter and setter functions) on PyObject give more freedom to consider different options to evolve Python tomorrow.
Is this worth it? Maybe we should just declare those structs and APIs *unstable* and tell people who use them that they can expect to be broken by each alpha release. As you say, hopefully this doesn't affect most people. Likely it'll affect Cython dramatically but Cython is such a special case that trying to evolve the C API will never satisfy them. We'll have to deal with it separately. (Debuggers are a more serious concern. We may need to provide higher-level APIs for debuggers to do the things they need to do. Mark's PEP 669 should help here.)
IMO we only need to add 5 to 10 functions to cover most use cases involving PyThreadState and PyFrameObject. The remaining least common usages can continue to require changes at each Python releases. The short term goal is only to reduce the number of required changes per Python releases. Yes, I'm talking about Cython, debuggers and profilers. Another example is that Cython currently calls PyCode_New() to create a fake frame object with a filename and line number. IMO it's the wrong abstraction level: Python should provide a function to create a frame with a filename and line number, so the caller doesn't have to bother about the complex PyCode_New() API and frequent PyCodeObject changes. (Correct me if this problem has already been solved in Python.)
In practice, what I did since Python 3.8 is to introduce a small number of C API changes per Python versions. (...)
How *do* you count this? Try to compile the top 5000 PyPI packages?
I'm using code search in the source code of top 5000 PyPI packages and I'm looking at broken packages in Fedora when we update Python. Also, sometimes people add a comment on an issue to mention that their project is broken by a change.
That might severely undercount a long tail of proprietary extensions.
Right. Victor -- Night gathers, and now my watch begins. It shall not end until my death.