PEP: Taking the Python C API to the Next Level
Hi, I would like to change the Python C API. I failed to write a single document listing all constraints and proposing all changes that I would to do. For example, my previous PEP 620 contains too many changes and is too long. Here is my attempt to focus on the bare minimum and (what I consider as) the least controversial part: list current usages of C API and constraints of these usages. This *informal* PEP should be the base of future PEP changing the C API. The current draft lives at: https://github.com/vstinner/misc/blob/main/cpython/pep-c-api-next-level.rst My PEP is based on HPy Next Level Manifesto written by Simon Cross: https://github.com/hpyproject/hpy/wiki/c-api-next-level-manifesto To reach most users of the C API, I cross-posted this email to python-dev, capi-sig and hpy-dev. +++++++++++++++++++++++++++++++++++++++++ Taking the Python C API to the Next Level +++++++++++++++++++++++++++++++++++++++++ :: PEP: xxx Title: Taking the Python C API to the Next Level Author: Victor Stinner <vstinner@python.org> Status: Draft Type: Informational Content-Type: text/x-rst Created: 28-Sep-2021 Python-Version: 3.11 While the C API is a key of the Python popularity, it causes multiple subtle and complex issues. There are different ways to use the C API, each usage has its own constraints, and some constraints are exclusive. This document lists constraints but doesn't propose changes, it only gives vague ideas how to solve some issues. More concrete C API changes will require writing separated PEPs. C extensions are a key component of the Python popularity ========================================================= The Python popularity comes from its great programming language and from its wide collection of modules freely available on PyPI. Many of the most popular Python modules rely directly or indirectly on C extensions written with the C API. The Python C API is a key component of the Python popularity. For example, the numpy project is now a common dependency on many scientific projects and a large part of the project is written by hand with the C API. **Abandoning or removing the C API** is out of question. Years ago, the incomplete C API support was the main drawback of PyPy, since PyPy only supported a minority of C extensions. Today, CPython still have a similar issue. **When Cython or numpy don't support a new Python version** (because of incompatible C API changes), many Python projects depending on them are cannot be installed, especially during the development phase of the next Python version. Backward compatibility and unmaintained C extensions ==================================================== One important property of the C API is the backward compatibility. Developers expect that if their C extension works on Python 3.10, it will work unmodified in Python 3.11: building the C extension with Python 3.11 should be enough. This property is even more important for unmaintained C extensions. Sometimes, unmaintained just means that the only maintainer is busy or overwhelmed for a few months. Sometimes, the project has no activity for longer than 5 years. When an incompatible change is introduced in the C API, like removing a function or changing a function behavior, there is a **risk of breaking an unknown number of C extensions**. One option can be to update old C extensions when they are built on recent Python versions, to adapt them to incompatible changes. This conversion is non trivial and cannot handle all kinds of incompatible changes. Migration plan for incompatible changes ======================================= There should be a **sensible migration path** for large C extensions (e.g. numpy) when incompatible changes are introduced. Whenever possible, it should be possible to write a **single code base** compatible with old and new Python versions. A **compatibility layer** can be maintained externally. Cython and numpy have their own internal compatibility layer. There should be a way to easily pick up common errors introduced by migrating. One practical way to **minimize the number of broken projects** is to attempt to check in advance if an incompatible change is going to break popular C extensions. For broken C extensions, propose a fix and wait until a new release includes the fix, before introducing the change in Python. Obviously, it doesn't solve the problem of less popular C extensions and private C extensions. Obtain the best possible performance ==================================== There are two main reasons for writing a C extension: implement a function which cannot be written in pure Python, or write a **C accelerator**: rewrite the 10% of an application in C where 90% of the CPU time is spent. About the former use case, the intent is to obtain the best possible performance. Tradeoffs are made with portability: it is acceptable to only support a limited number of Python versions and to only support a limited number of Python implementations (usually only CPython). Cython is a good example of accelerator. It is able to support a large number of Python versions and multiple Python implementation with compatibility layers and ``#ifdef``. The main drawback is that it is common that Cython is **broken by incompatible changes made at each Python release**. It happens because Cython relies on many implementation details. On the other side, the **limited C API** is a small as possible, excludes implementation details on purpose, and provides a stable ABI. Building a C extension with the limited C API only once produces a binary wheel package usable on many Python versions, but each platform still requires its own binary wheel package. Emulating the current C API is inefficient ========================================== The PyPy project is a Python implementation written from scratch, it was not created as a CPython fork. It made many implementation choices different than CPython: no reference counting, moving garbage collector, JIT compiler, etc. To support C extensions, PyPy emulates the Python C API in its cpyext module. When the C API access an object, cpyext has to convert the PyPy object to a CPython object (``PyObject``). CPython objects are less efficient than PyPy objects with the PyPy JIT compiler and conversions from PyPy objects to CPython objects are also inefficient. PyPy has to reimplement every single detail of the CPython implementation to be as much compatible as possible. The C API exposes multiple implementation details: * Reference counting, borrowed references, stealing references. * Objects location in memory. * Rely on pointers for object identity: Python 3.10 adds the ``Py_Is()`` function to solve this problem. * Expose the memory layout of Python objects as part of the API. * Expose static types. * Implicit execution context. * etc. The C API of Python 3.10 is made of around 15 000 lines of C header files, 1500 functions and 100 structures. Supporting the full C API is a significant amount of work. **Freezing the C API** for a few Python releases would help other Python implementations to catch up with the latest Python version, but it doesn't solve the efficiency problem. Moreover, it is common that adding a new feature to Python requires to change the C API, even if it is just to add new functions. Not adding new features to Python for a few Python releases is out of question. The C API prevents optimizing CPython ===================================== It is challenging to evolve the C API to optimize CPython without breaking the backward compatibility. Emulating the old C API is an option, but it is inefficient. If everything above is achievable -- and we believe it is! -- we'll arrive in a wonderful new future where Python implementations can experiment with all sorts of amazing new features: * tracing garbage collectors; * nurseries for short-lived objects; * sub-interpreters with separate contexts; * specialised implementations of lists; * removing the GIL; * avoiding the boxing of primitive types; * just-in-time compilation; * ... and many other things you can imagine that we haven't! No one can guarantee that a particular new idea will work out, but exposing fewer implementation details via the C API will make it possible to try many new things. -- Night gathers, and now my watch begins. It shall not end until my death.
participants (1)
-
Victor Stinner