Commenting more on specific technical details rather than just tone this time :) On 11 July 2017 at 20:19, Victor Stinner <victor.stinner@gmail.com> wrote:
PEP: xxx Title: Hide implementation details in the C API Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner <victor.stinner@gmail.com>, Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 31-May-2017
Abstract ========
Modify the C API to remove implementation details. Add an opt-in option to compile C extensions to get the old full API with implementation details.
The modified C API allows to more easily experiment new optimizations:
* Indirect Reference Counting * Remove Reference Counting, New Garbage Collector * Remove the GIL * Tagged pointers
Reference counting may be emulated in a future implementation for backward compatibility.
I don't believe this is the best rationale to use for the PEP, as we (or at least I) have emphatically promised *not* to do another Python 3 style compatibility break, and we know from PyPy's decade of challenges that a lot of Python's users care even more about CPython C API/ABI compatibility than they do the core data model. It also has the downside of not really being true, since *other implementations* are happily experimenting with alternative approaches, and projects like PyMetabiosis attempt to use CPython itself as an adapter between other runtimes and the full C API for those extension modules that need it. What is unequivocally true though is that in the current C API: 1. We're not sure which APIs other projects (including extension module generators and helper libraries like Cython, Boost, PyCXX, SWIG, cffi, etc) are *actually* relying on. 2. It's easy for us to accidentally expand the public C API without thinking about it, since Py_BUILD_CORE guards are opt-in and Py_LIMITED_API guards are opt-out 3. We haven't structured our header files in a way that makes it obvious at a glance which API we're modifying (internal API, public API, stable ABI)
Rationale =========
History of CPython forks ------------------------
Last 10 years, CPython was forked multiple times to attempt different CPython enhancements:
* Unladen Swallow: add a JIT compiler based on LLVM * Pyston: add a JIT compiler based on LLVM (CPython 2.7 fork) * Pyjion: add a JIT compiler based on Microsoft CLR * Gilectomy: remove the Global Interpreter Lock nicknamed "GIL" * etc.
Sadly, none is this project has been merged back into CPython. Unladen Swallow looses its funding from Google, Pyston looses its funding from Dropbox, Pyjion is developed in the limited spare time of two Microsoft employees.
One hard technically issue which blocked these projects to really unleash their power is the C API of CPython.
This is a somewhat misleadingly one-sided presentation of Python's history, as the broad access to CPython internals offered by the C API is precisely what *enabled* the scientific Python stack (including NumPy, SciPy, Pandas, scikit-learn, Cython, Numba, PyCUDA, etc) to develop largely independently of CPython itself. So for folks that are willing to embrace the use of Cython (and extension modules in general), many of CPython's runtime limitations (like the GIL and the overheads of working with boxed values) can already be avoided by pushing particular sections of code closer to C semantics than they are to traditional Python semantics. We've also been working to bring the runtime semantics of extension modules ever closer to those of pure Python modules, to the point where Python 3.7 is likely to be able to run an extension module as __main__ (see https://www.python.org/dev/peps/pep-0547/ for details)
Many old technical choices of CPython are hardcoded in this API:
* reference counting * garbage collector * C structures like PyObject which contains headers for reference counting and the garbage collector * specific memory allocators * etc.
PyPy ----
PyPy uses more efficient structures and use a more efficient garbage collector without reference counting. Thanks to that (but also many other optimizations), PyPy succeeded to run Python code up to 5x faster than CPython.
This framing makes it look a bit like you're saying "It's hard for PyPy to correctly emulate these aspects of CPython, so we should eliminate them as a barrier to adoption for PyPy by breaking them for currently happy CPython's users as well". I don't think that's really a framing you want to run with in the near term, as it's going to start a needless fight, when there's plenty of unambiguously beneficial work that coule be done before anyone starts contemplating any kind of API compatibility break :) In particular, better segmenting our APIs into "solely for CPython's internal use", "ABI is specific to a CPython version", "API is portable across Python implementations", "ABI is portable across CPython versions (and maybe even Python implementations)" allows tooling developers and extension module authors to make more informed decisions about how closely they want to couple their work to CPython specifically. And then *after* we've done that API clarification work, *then* we can ask the question about what the default behaviour of "#include <Python.h>" should be, and perhaps introduce an opt-in Py_CPYTHON_API flag to request access to the full traditional C API for extension modules and embedding applications that actually need it. (While that's still a compatibility break, it's one that can be trivially resolved by putting an unconditional "#define Py_CPYTHON_API" before the Python header inclusion for projects that find they were actually relying on CPython specifics)
Plan made of multiple small steps =================================
Step 1: split Include/ into subdirectories ------------------------------------------
Split the ``Include/`` directory of CPython:
* ``python`` API: ``Include/Python.h`` remains the default C API * ``core`` API: ``Include/core/Python.h`` is a new C API designed for building Python * ``stable`` API: ``Include/stable/Python.h`` is the stable ABI
Expect declarations to be duplicated on purpose: ``#include`` should be not used to include files from a different API to prevent mistakes. In the past, too many functions were exposed *by mistake*, especially symbols exported to the stable ABI by mistake.
At this point, ``Include/Python.h`` is not changed at all: zero risk of backward incompatibility.
The ``core`` API is the most complete API exposing *all* implementation details and use macros for best performances.
This part I like, although as Eric noted, we can avoid making wholesale changes to the headers of our implementation files by putting a Py_BUILD_CORE guard around the inclusion of a "Include/core/_CPython.h" header from "Include/Python.h"
XXX should we abandon the stable ABI? Never really used by anyone.
It's also not available in Python 2.7, so anyone straddling the 2/3 boundary isn't currently able to rely on it. As folks become more willing to drop Python 2.7 support, then expending the effort to start targeting the stable ABI instead becomes more attractive (especially for extension module creation tools like Cython, cffi, and SWIG), since the stable ABI usage can *replace* the code that uses the traditional CPython API.
Step 2: Add an opt-in API option to tools building packages -----------------------------------------------------------
Modify Python packaging tools (distutils, setuptools, flit, pip, etc.) to add an opt-in option to choose the API: ``python``, ``core`` or ``stable``.
For example, debuggers like ``vmprof`` need need the ``core`` API to get a full access to implementation details.
XXX handle backward compatibility for packaging tools.
For handcoded extensions, defining which API to use would be part of the C/C++ code. For generated extensions, it would be an option passed to Cython, cffi, etc. Packaging frontends shouldn't need to explicitly support it any more than they explicitly support the stable ABI today.
Step 3: first pass of implementation detail removal ---------------------------------------------------
Modify the ``python`` API:
* Add a new ``API`` subdirectory in the Python source code which will "implement" the Python C API * Replace macros with functions. The implementation of new functions will be written in the ``API/`` directory. For example, Py_INCREF() becomes the function ``void Py_INCREF(PyObject *op)`` and its implementation will be written in the ``API`` directory. * Slowly remove more and more implementation details from this API.
I'd suggest doing this slightly differently by ensuring that the APIs are defined as strict supersets of each other as follows: 1. CPython internal APIs (Py_BUILD_CORE) 2. CPython C API (status quo, currently no qualifier) 3. Portable Python API (new, starts as equivalent to stable ABI) 4. Stable Python ABI (Py_LIMITED_API) The two new qualifiers would then be: #define Py_CPYTHON_API #define Py_PORTABLE_API And Include/Python.h would end up looking something like this: [Common configuration includes would still go here] #ifdef Py_BUILD_CORE #include "core/_CPython.h" #else #ifdef Py_LIMITED_API #include "stable/Python.h" #else #ifdef Py_PORTABLE_API #include "portable/Python.h" #else #define Py_CPYTHON_API #include "cpython/Python.h" #endif #endif #endif At some future date, the default could then potentially switch to being the portable API for the current Python version, with folks having to opt-in to using either the full CPython API or the portable API for an older version. To avoid having to duplicate prototype definitions, and to ensure that C compilers complain when we inadvertently redefine a symbol differently from the way a more restricted API defines it, each API superset would start by including the next narrower API. So we'd have this: Include/stable/Python.h: [No special preamble, as it's the lowest common denominator API] Include/portable/Python.h: #define Py_LIMITED_API Py_PORTABLE_API #include "../stable/Python.h" #undef Py_LIMITED_API [Any desired API additions and overrides] Include/cpython/Python.h: #include "../patchlevel.h" #define Py_PORTABLE_API PY_VERSION_HEX #include "../portable/Python.h" #undef Py_PORTABLE_API [Include the rest of the current public C API] Include/core/_CPython.h: #ifndef Py_BUILD_CORE #error "Internal headers are only available when building CPython" #endif #include "../cpython/Python.h" [Include the rest of the internal C API] And at least initially, the subdirectories would be mostly empty - instead, we'd have the following setup: 1. Unported headers would remain directly in "Include/" and be included from "Include/Python.h" 2. Ported headers would have their contents split between core, cpython, and stable based on their #ifdef chains 3. When porting, the more expansive APIs would use "#undef" as needed when overriding a symbol deliberately And then, once all the APIs had been clearly categorised in a way that C compilers can better help us manage, the folks that were interested in this could start building key extension modules (such as NumPy and lxml) using "Py_PORTABLE_API=0x03070000", and *adding* to the portable API on an explicitly needs-driven basis. Cheers, NIck. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia