PEP 620: Hide implementation details from the C API
Hi, PEP available at: https://www.python.org/dev/peps/pep-0620/ <introduction> This PEP is the result of 4 years of research work on the C API: https://pythoncapi.readthedocs.io/ It's the third version. The first version (2017) proposed to add a "new C API" and advised C extensions maintainers to opt-in for it: it was basically the same idea as PEP 384 limited C API but in a different color. Well, I had no idea of what I was doing :-) The second version (April 2020) proposed to add a new Python runtime built from the same code base as the regular Python runtime but in a different build mode, the regular Python would continue to be fully compatible. I wrote the third version, the PEP 620, from scratch. It now gives an explicit and concrete list of incompatible C API changes, and has better motivation and rationale sections. The main PEP novelty is the new pythoncapi_compat.h header file distributed with Python to provide new C API functions to old Python versions, the second novelty is the process to reduce the number of broken C extensions. Whereas PEPs are usually implemented in a single Python version, the implementation of this PEP is expected to be done carefully over multiple Python versions. The PEP lists many changes which are already implemented in Python 3.7, 3.8 and 3.9. It defines a process to reduce the number of broken C extensions when introducing the incompatible C API changes listed in the PEP. The process dictates the rhythm of these changes. </introduction> PEP: 620 Title: Hide implementation details from the C API Author: Victor Stinner <vstinner@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 19-June-2020 Python-Version: 3.10 Abstract ======== Introduce C API incompatible changes to hide implementation details. Once most implementation details will be hidden, evolution of CPython internals would be less limited by C API backward compatibility issues. It will be way easier to add new features. It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations, like tagged pointers. Define a process to reduce the number of broken C extensions. The implementation of this PEP is expected to be done carefully over multiple Python versions. It already started in Python 3.7 and most changes are already completed. The `Process to reduce the number of broken C extensions`_ dictates the rhythm. Motivation ========== The C API blocks CPython evolutions ----------------------------------- Adding or removing members of C structures is causing multiple backward compatibility issues. Adding a new member breaks the stable ABI (PEP 384), especially for types declared statically (e.g. ``static PyTypeObject MyType = {...};``). In Python 3.4, the PEP 442 "Safe object finalization" added the ``tp_finalize`` member at the end of the ``PyTypeObject`` structure. For ABI backward compatibility, a new ``Py_TPFLAGS_HAVE_FINALIZE`` type flag was required to announce if the type structure contains the ``tp_finalize`` member. The flag was removed in Python 3.8 (`bpo-32388 <https://bugs.python.org/issue32388>`_). The ``PyTypeObject.tp_print`` member, deprecated since Python 3.0 released in 2009, has been removed in the Python 3.8 development cycle. But the change broke too many C extensions and had to be reverted before 3.8 final release. Finally, the member was removed again in Python 3.9. C extensions rely on the ability to access directly structure members, indirectly through the C API, or even directly. Modifying structures like ``PyListObject`` cannot be even considered. The ``PyTypeObject`` structure is the one which evolved the most, simply because there was no other way to evolve CPython than modifying it. In the C API, all Python objects are passed as ``PyObject*``: a pointer to a ``PyObject`` structure. Experimenting tagged pointers in CPython is blocked by the fact that a C extension can technically dereference a ``PyObject*`` pointer and access ``PyObject`` members. Small "objects" can be stored as a tagged pointer with no concrete ``PyObject`` structure. Replacing Python garbage collector with a tracing garbage collector would also need to remove ``PyObject.ob_refcnt`` reference counter, whereas currently ``Py_INCREF()`` and ``Py_DECREF()`` macros access directly to ``PyObject.ob_refcnt``. Same CPython design since 1990: structures and reference counting ----------------------------------------------------------------- When the CPython project was created, it was written with one principle: keep the implementation simple enough so it can be maintained by a single developer. CPython complexity grew a lot and many micro-optimizations have been implemented, but CPython core design has not changed. Members of ``PyObject`` and ``PyTupleObject`` structures have not changed since the "Initial revision" commit (1990):: #define OB_HEAD \ unsigned int ob_refcnt; \ struct _typeobject *ob_type; typedef struct _object { OB_HEAD } object; typedef struct { OB_VARHEAD object *ob_item[1]; } tupleobject; Only names changed: ``object`` was renamed to ``PyObject`` and ``tupleobject`` was renamed to ``PyTupleObject``. CPython still tracks Python objects lifetime using reference counting internally and for third party C extensions (through the Python C API). All Python objects must be allocated on the heap and cannot be moved. Why is PyPy more efficient than CPython? ---------------------------------------- The PyPy project is a Python implementation which is 4.2x faster than CPython on average. PyPy developers chose to not fork CPython, but start from scratch to have more freedom in terms of optimization choices. PyPy does not use reference counting, but a tracing garbage collector which moves objects. Objects can be allocated on the stack (or even not at all), rather than always having to be allocated on the heap. Objects layouts are designed with performance in mind. For example, a list strategy stores integers directly as integers, rather than objects. Moreover, PyPy also has a JIT compiler which emits fast code thanks to the efficient PyPy design. PyPy bottleneck: the Python C API --------------------------------- While PyPy is way more efficient than CPython to run pure Python code, it is as efficient or slower than CPython to run C extensions. Since the C API requires ``PyObject*`` and allows to access directly structure members, PyPy has to associate a CPython object to PyPy objects and maintain both consistent. Converting a PyPy object to a CPython object is inefficient. Moreover, reference counting also has to be implemented on top of PyPy tracing garbage collector. These conversions are required because the Python C API is too close to the CPython implementation: there is no high-level abstraction. For example, structures members are part of the public C API and nothing prevents a C extension to get or set directly ``PyTupleObject.ob_item[0]`` (the first item of a tuple). See `Inside cpyext: Why emulating CPython C API is so Hard <https://morepypy.blogspot.com/2018/09/inside-cpyext-why-emulating-cpython-c.html>`_ (Sept 2018) by Antonio Cuni for more details. Rationale ========= Hide implementation details --------------------------- Hiding implementation details from the C API has multiple advantages: * It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations. For example, tagged pointers, and replace the garbage collector with a tracing garbage collector which can move objects. * Adding new features in CPython becomes easier. * PyPy should be able to avoid conversions to CPython objects in more cases: keep efficient PyPy objects. * It becomes easier to implement the C API for a new Python implementation. * More C extensions will be compatible with Python implementations other than CPython. Relationship with the limited C API ----------------------------------- The PEP 384 "Defining a Stable ABI" is in Python 3.4. It introduces the "limited C API": a subset of the C API. When the limited C API is used, it becomes possible to build a C extension only once and use it on multiple Python versions: that's the stable ABI. The main limitation of the PEP 384 is that C extensions have to opt-in for the limited C API. Only very few projects made this choice, usually to ease distribution of binaries, especially on Windows. This PEP moves the C API towards the limited C API. Ideally, the C API will become the limited C API and all C extensions will use the stable ABI, but this is out of this PEP scope. Specification ============= Summary ------- * (**Completed**) Reorganize the C API header files: create ``Include/cpython/`` and ``Include/internal/`` subdirectories. * (**Completed**) Move private functions exposing implementation details to the internal C API. * (**Completed**) Convert macros to static inline functions. * (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros become functions which cannot be used as l-value. * (**Completed**) New C API functions must not return borrowed references. * (**In Progress**) Provide ``pythoncapi_compat.h`` header file. * (**In Progress**) Make structures opaque, add getter and setter functions. * (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``. * (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros to static inline functions. Reorganize the C API header files --------------------------------- The first consumer of the C API was Python itself. There is no clear separation between APIs which must not be used outside Python, and API which are public on purpose. Header files must be reorganized in 3 API: * ``Include/`` directory is the limited C API: no implementation details, structures are opaque. C extensions using it get a stable ABI. * ``Include/cpython/`` directory is the CPython C API: less "portable" API, depends more on the Python version, expose some implementation details, few incompatible changes can happen. * ``Internal/internal/`` directory is the internal C API: implementation details, incompatible changes are likely at each Python release. The creation of the ``Include/cpython/`` directory is fully backward compatible. ``Include/cpython/`` header files cannot be included directly and are included automatically by ``Include/`` header files when the ``Py_LIMITED_API`` macro is not defined. The internal C API is installed and can be used for specific usage like debuggers and profilers which must access structures members without executing code. C extensions using the internal C API are tightly coupled to a Python version and must be recompiled at each Python version. **STATUS**: Completed (in Python 3.8) The reorganization of header files started in Python 3.7 and was completed in Python 3.8: * `bpo-35134 <https://bugs.python.org/issue35134>`_: Add a new Include/cpython/ subdirectory for the "CPython API" with implementation details. * `bpo-35081 <https://bugs.python.org/issue35081>`_: Move internal headers to ``Include/internal/`` Move private functions to the internal C API -------------------------------------------- Private functions which expose implementation details must be moved to the internal C API. If a C extension relies on a CPython private function which exposes CPython implementation details, other Python implementations have to re-implement this private function to support this C extension. **STATUS**: Completed (in Python 3.9) Private functions moved to the internal C API in Python 3.8: * ``_PyObject_GC_TRACK()``, ``_PyObject_GC_UNTRACK()`` Macros and functions excluded from the limited C API in Python 3.9: * ``_PyObject_SIZE()``, ``_PyObject_VAR_SIZE()`` * ``PyThreadState_DeleteCurrent()`` * ``PyFPE_START_PROTECT()``, ``PyFPE_END_PROTECT()`` * ``_Py_NewReference()``, ``_Py_ForgetReference()`` * ``_PyTraceMalloc_NewReference()`` * ``_Py_GetRefTotal()`` Private functions moved to the internal C API in Python 3.9: * GC functions like ``_Py_AS_GC()``, ``_PyObject_GC_IS_TRACKED()`` and ``_PyGCHead_NEXT()`` * ``_Py_AddToAllObjects()`` (not exported) * ``_PyDebug_PrintTotalRefs()``, ``_Py_PrintReferences()``, ``_Py_PrintReferenceAddresses()`` (not exported) Public "clear free list" functions moved to the internal C API an renamed to private functions and in Python 3.9: * ``PyAsyncGen_ClearFreeLists()`` * ``PyContext_ClearFreeList()`` * ``PyDict_ClearFreeList()`` * ``PyFloat_ClearFreeList()`` * ``PyFrame_ClearFreeList()`` * ``PyList_ClearFreeList()`` * ``PyTuple_ClearFreeList()`` * Functions simply removed: * ``PyMethod_ClearFreeList()`` and ``PyCFunction_ClearFreeList()``: bound method free list removed in Python 3.9. * ``PySet_ClearFreeList()``: set free list removed in Python 3.4. * ``PyUnicode_ClearFreeList()``: Unicode free list removed in Python 3.3. Convert macros to static inline functions ----------------------------------------- Converting macros to static inline functions have multiple advantages: * Functions have well defined parameter types and return type. * Functions can use variables with a well defined scope (the function). * Debugger can be put breakpoints on functions and profilers can display the function name in the call stacks. In most cases, it works even when a static inline function is inlined. * Functions don't have `macros pitfalls <https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html>`_. Converting macros to static inline functions should only impact very few C extensions which use macros in unusual ways. For backward compatibility, functions must continue to accept any type, not only ``PyObject*``, to avoid compiler warnings, since most macros cast their parameters to ``PyObject*``. Python 3.6 requires C compilers to support static inline functions: the PEP 7 requires a subset of C99. **STATUS**: Completed (in Python 3.9) Macros converted to static inline functions in Python 3.8: * ``Py_INCREF()``, ``Py_DECREF()`` * ``Py_XINCREF()``, ``Py_XDECREF()`` * ``PyObject_INIT()``, ``PyObject_INIT_VAR()`` * ``_PyObject_GC_TRACK()``, ``_PyObject_GC_UNTRACK()``, ``_Py_Dealloc()`` Macros converted to regular functions in Python 3.9: * ``Py_EnterRecursiveCall()``, ``Py_LeaveRecursiveCall()`` (added to the limited C API) * ``PyObject_INIT()``, ``PyObject_INIT_VAR()`` * ``PyObject_GET_WEAKREFS_LISTPTR()`` * ``PyObject_CheckBuffer()`` * ``PyIndex_Check()`` * ``PyObject_IS_GC()`` * ``PyObject_NEW()`` (alias to ``PyObject_New()``), ``PyObject_NEW_VAR()`` (alias to ``PyObject_NewVar()``) * ``PyType_HasFeature()`` (always call ``PyType_GetFlags()``) * ``Py_TRASHCAN_BEGIN_CONDITION()`` and ``Py_TRASHCAN_END()`` macros now call functions which hide implementation details, rather than accessing directly members of the ``PyThreadState`` structure. Make structures opaque ---------------------- All structures of the C API should become opaque: C extensions must use getter or setter functions to get or set structure members. For example, ``tuple->ob_item[0]`` must be replaced with ``PyTuple_GET_ITEM(tuple, 0)``. To be able to move away from reference counting, ``PyObject`` must become opaque. Currently, the reference counter ``PyObject.ob_refcnt`` is exposed in the C API. All structures must become opaque, since they "inherit" from PyObject. For, ``PyFloatObject`` inherits from ``PyObject``:: typedef struct { PyObject ob_base; double ob_fval; } PyFloatObject; Making ``PyObject`` fully opaque requires converting ``Py_INCREF()`` and ``Py_DECREF()`` macros to function calls. This change has an impact on performance. It is likely to be one of the very last changes when making structures opaque. Making ``PyTypeObject`` structure opaque breaks C extensions declaring types statically (e.g. ``static PyTypeObject MyType = {...};``). C extensions must use ``PyType_FromSpec()`` to allocate types on the heap instead. Using heap types has other advantages like being compatible with subinterpreters. Combined with PEP 489 "Multi-phase extension module initialization", it makes a C extension behavior closer to a Python module, like allowing to create more than one module instance. Making ``PyThreadState`` structure opaque requires adding getter and setter functions for members used by C extensions. **STATUS**: In Progress (started in Python 3.8) The ``PyInterpreterState`` structure was made opaque in Python 3.8 (`bpo-35886 <https://bugs.python.org/issue35886>`_) and the ``PyGC_Head`` structure (`bpo-40241 <https://bugs.python.org/issue40241>`_) was made opaque in Python 3.9. Issues tracking the work to prepare the C API to make following structures opaque: * ``PyObject``: `bpo-39573 <https://bugs.python.org/issue39573>`_ * ``PyTypeObject``: `bpo-40170 <https://bugs.python.org/issue40170>`_ * ``PyFrameObject``: `bpo-40421 <https://bugs.python.org/issue40421>`_ * Python 3.9 adds ``PyFrame_GetCode()`` and ``PyFrame_GetBack()`` getter functions, and moves ``PyFrame_GetLineNumber`` to the limited C API. * ``PyThreadState``: `bpo-39947 <https://bugs.python.org/issue39947>`_ * Python 3.9 adds 3 getter functions: ``PyThreadState_GetFrame()``, ``PyThreadState_GetID()``, ``PyThreadState_GetInterpreter()``. Disallow using Py_TYPE() as l-value ----------------------------------- The ``Py_TYPE()`` function gets an object type, its ``PyObject.ob_type`` member. It is implemented as a macro which can be used as an l-value to set the type: ``Py_TYPE(obj) = new_type``. This code relies on the assumption that ``PyObject.ob_type`` can be modified directly. It prevents making the ``PyObject`` structure opaque. New setter functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()`` are added and must be used instead. The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros must be converted to static inline functions which can not be used as l-value. For example, the ``Py_TYPE()`` macro:: #define Py_TYPE(ob) (((PyObject*)(ob))->ob_type) becomes:: #define _PyObject_CAST_CONST(op) ((const PyObject*)(op)) static inline PyTypeObject* _Py_TYPE(const PyObject *ob) { return ob->ob_type; } #define Py_TYPE(ob) _Py_TYPE(_PyObject_CAST_CONST(ob)) **STATUS**: Completed (in Python 3.10) New functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()`` were added to Python 3.9. In Python 3.10, ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` can no longer be used as l-value and the new setter functions must be used instead. New C API functions must not return borrowed references ------------------------------------------------------- When a function returns a borrowed reference, Python cannot track when the caller stops using this reference. For example, if the Python ``list`` type is specialized for small integers, store directly "raw" numbers rather than Python objects, ``PyList_GetItem()`` has to create a temporary Python object. The problem is to decide when it is safe to delete the temporary object. The general guidelines is to avoid returning borrowed references for new C API functions. No function returning borrowed functions is scheduled for removal by this PEP. **STATUS**: Completed (in Python 3.9) In Python 3.9, new C API functions returning Python objects only return strong references: * ``PyFrame_GetBack()`` * ``PyFrame_GetCode()`` * ``PyObject_CallNoArgs()`` * ``PyObject_CallOneArg()`` * ``PyThreadState_GetFrame()`` Avoid functions returning PyObject** ------------------------------------ The ``PySequence_Fast_ITEMS()`` function gives a direct access to an array of ``PyObject*`` objects. The function is deprecated in favor of ``PyTuple_GetItem()`` and ``PyList_GetItem()``. ``PyTuple_GET_ITEM()`` can be abused to access directly the ``PyTupleObject.ob_item`` member:: PyObject **items = &PyTuple_GET_ITEM(0); The ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros are converted to static inline functions to disallow that. **STATUS**: Not Started New pythoncapi_compat.h header file ----------------------------------- Making structures opaque requires modifying C extensions to use getter and setter functions. The practical issue is how to keep support for old Python versions which don't have these functions. For example, in Python 3.10, it is no longer possible to use ``Py_TYPE()`` as an l-value. The new ``Py_SET_TYPE()`` function must be used instead:: #if PY_VERSION_HEX >= 0x030900A4 Py_SET_TYPE(&MyType, &PyType_Type); #else Py_TYPE(&MyType) = &PyType_Type; #endif This code may ring a bell to developers who ported their Python code base from Python 2 to Python 3. Python will distribute a new ``pythoncapi_compat.h`` header file which provides new C API functions to old Python versions. Example:: #if PY_VERSION_HEX < 0x030900A4 static inline void _Py_SET_TYPE(PyObject *ob, PyTypeObject *type) { ob->ob_type = type; } #define Py_SET_TYPE(ob, type) _Py_SET_TYPE((PyObject*)(ob), type) #endif // PY_VERSION_HEX < 0x030900A4 Using this header file, ``Py_SET_TYPE()`` can be used on old Python versions as well. Developers can copy this file in their project, or even to only copy/paste the few functions needed by their C extension. **STATUS**: In Progress (implemented but not distributed by CPython yet) The ``pythoncapi_compat.h`` header file is currently developer at: https://github.com/pythoncapi/pythoncapi_compat Process to reduce the number of broken C extensions =================================================== Process to reduce the number of broken C extensions when introducing C API incompatible changes listed in this PEP: * Estimate how many popular C extensions are affected by the incompatible change. * Coordinate with maintainers of broken C extensions to prepare their code for the future incompatible change. * Introduce the incompatible changes in Python. The documentation must explain how to port existing code. It is recommended to merge such changes at the beginning of a development cycle to have more time for tests. * Changes which are the most likely to break a large number of C extensions should be announced on the capi-sig mailing list to notify C extensions maintainers to prepare their project for the next Python. * If the change breaks too many projects, reverting the change should be discussed, taking in account the number of broken packages, their importance in the Python community, and the importance of the change. The coordination usually means reporting issues to the projects, or even proposing changes. It does not require waiting for a new release including fixes for every broken project. Since more and more C extensions are written using Cython, rather directly using the C API, it is important to ensure that Cython is prepared in advance for incompatible changes. It gives more time for C extension maintainers to release a new version with code generated with the updated Cython (for C extensions distributing the code generated by Cython). Future incompatible changes can be announced by deprecating a function in the documentation and by annotating the function with ``Py_DEPRECATED()``. But making a structure opaque and preventing the usage of a macro as l-value cannot be deprecated with ``Py_DEPRECATED()``. The important part is coordination and finding a balance between CPython evolutions and backward compatibility. For example, breaking a random, old, obscure and unmaintained C extension on PyPI is less severe than breaking numpy. If a change is reverted, we move back to the coordination step to better prepare the change. Once more C extensions are ready, the incompatible change can be reconsidered. Version History =============== * Version 3, June 2020: PEP rewritten from scratch. Python now distributes a new ``pythoncapi_compat.h`` header and a process is defined to reduce the number of broken C extensions when introducing C API incompatible changes listed in this PEP. * Version 2, April 2020: `PEP: Modify the C API to hide implementation details <https://mail.python.org/archives/list/python-dev@python.org/thread/HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF/#TKHNENOXP6H34E73XGFOL2KKXSM4Z6T2>`_. * Version 1, July 2017: `PEP: Hide implementation details in the C API <https://mail.python.org/archives/list/python-ideas@python.org/thread/6XATDGWK4VBUQPRHCRLKQECTJIPBVNJQ/#HFBGCWVLSM47JEP6SO67MRFT7Y3EOC44>`_ sent to python-ideas Copyright ========= This document has been placed in the public domain. -- Night gathers, and now my watch begins. It shall not end until my death.
Thanks, Victor for awesome PEP I am a big +1 on this proposal since some of the core developers already need the evolution of C API. I believe this proposal is not only for alternative python compiler implementation but also gives a chance for enhancing CPython performance. And I love this proposal since the suggestion does not say break everything in a one-shot but pursuing incremental change. The one thing that we have to do for this PEP is that informing the change very well to the 3rd party library. With well-written changing documentation, I believe most of the impactful 3rd party library which sustains Python library community will have enough time for preparing change. I believe that if there is no change there will be no evolution. Let's make CPython more fastly with temporarily suffering Regards from Korea, Dong-hee 2020년 6월 22일 (월) 오후 9:13, Victor Stinner <vstinner@python.org>님이 작성:
Hi,
PEP available at: https://www.python.org/dev/peps/pep-0620/
<introduction> This PEP is the result of 4 years of research work on the C API: https://pythoncapi.readthedocs.io/
It's the third version. The first version (2017) proposed to add a "new C API" and advised C extensions maintainers to opt-in for it: it was basically the same idea as PEP 384 limited C API but in a different color. Well, I had no idea of what I was doing :-) The second version (April 2020) proposed to add a new Python runtime built from the same code base as the regular Python runtime but in a different build mode, the regular Python would continue to be fully compatible.
I wrote the third version, the PEP 620, from scratch. It now gives an explicit and concrete list of incompatible C API changes, and has better motivation and rationale sections. The main PEP novelty is the new pythoncapi_compat.h header file distributed with Python to provide new C API functions to old Python versions, the second novelty is the process to reduce the number of broken C extensions.
Whereas PEPs are usually implemented in a single Python version, the implementation of this PEP is expected to be done carefully over multiple Python versions. The PEP lists many changes which are already implemented in Python 3.7, 3.8 and 3.9. It defines a process to reduce the number of broken C extensions when introducing the incompatible C API changes listed in the PEP. The process dictates the rhythm of these changes. </introduction>
PEP: 620 Title: Hide implementation details from the C API Author: Victor Stinner <vstinner@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 19-June-2020 Python-Version: 3.10
Abstract ========
Introduce C API incompatible changes to hide implementation details.
Once most implementation details will be hidden, evolution of CPython internals would be less limited by C API backward compatibility issues. It will be way easier to add new features.
It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations, like tagged pointers.
Define a process to reduce the number of broken C extensions.
The implementation of this PEP is expected to be done carefully over multiple Python versions. It already started in Python 3.7 and most changes are already completed. The `Process to reduce the number of broken C extensions`_ dictates the rhythm.
Motivation ==========
The C API blocks CPython evolutions -----------------------------------
Adding or removing members of C structures is causing multiple backward compatibility issues.
Adding a new member breaks the stable ABI (PEP 384), especially for types declared statically (e.g. ``static PyTypeObject MyType = {...};``). In Python 3.4, the PEP 442 "Safe object finalization" added the ``tp_finalize`` member at the end of the ``PyTypeObject`` structure. For ABI backward compatibility, a new ``Py_TPFLAGS_HAVE_FINALIZE`` type flag was required to announce if the type structure contains the ``tp_finalize`` member. The flag was removed in Python 3.8 (`bpo-32388 <https://bugs.python.org/issue32388>`_).
The ``PyTypeObject.tp_print`` member, deprecated since Python 3.0 released in 2009, has been removed in the Python 3.8 development cycle. But the change broke too many C extensions and had to be reverted before 3.8 final release. Finally, the member was removed again in Python 3.9.
C extensions rely on the ability to access directly structure members, indirectly through the C API, or even directly. Modifying structures like ``PyListObject`` cannot be even considered.
The ``PyTypeObject`` structure is the one which evolved the most, simply because there was no other way to evolve CPython than modifying it.
In the C API, all Python objects are passed as ``PyObject*``: a pointer to a ``PyObject`` structure. Experimenting tagged pointers in CPython is blocked by the fact that a C extension can technically dereference a ``PyObject*`` pointer and access ``PyObject`` members. Small "objects" can be stored as a tagged pointer with no concrete ``PyObject`` structure.
Replacing Python garbage collector with a tracing garbage collector would also need to remove ``PyObject.ob_refcnt`` reference counter, whereas currently ``Py_INCREF()`` and ``Py_DECREF()`` macros access directly to ``PyObject.ob_refcnt``.
Same CPython design since 1990: structures and reference counting -----------------------------------------------------------------
When the CPython project was created, it was written with one principle: keep the implementation simple enough so it can be maintained by a single developer. CPython complexity grew a lot and many micro-optimizations have been implemented, but CPython core design has not changed.
Members of ``PyObject`` and ``PyTupleObject`` structures have not changed since the "Initial revision" commit (1990)::
#define OB_HEAD \ unsigned int ob_refcnt; \ struct _typeobject *ob_type;
typedef struct _object { OB_HEAD } object;
typedef struct { OB_VARHEAD object *ob_item[1]; } tupleobject;
Only names changed: ``object`` was renamed to ``PyObject`` and ``tupleobject`` was renamed to ``PyTupleObject``.
CPython still tracks Python objects lifetime using reference counting internally and for third party C extensions (through the Python C API).
All Python objects must be allocated on the heap and cannot be moved.
Why is PyPy more efficient than CPython? ----------------------------------------
The PyPy project is a Python implementation which is 4.2x faster than CPython on average. PyPy developers chose to not fork CPython, but start from scratch to have more freedom in terms of optimization choices.
PyPy does not use reference counting, but a tracing garbage collector which moves objects. Objects can be allocated on the stack (or even not at all), rather than always having to be allocated on the heap.
Objects layouts are designed with performance in mind. For example, a list strategy stores integers directly as integers, rather than objects.
Moreover, PyPy also has a JIT compiler which emits fast code thanks to the efficient PyPy design.
PyPy bottleneck: the Python C API ---------------------------------
While PyPy is way more efficient than CPython to run pure Python code, it is as efficient or slower than CPython to run C extensions.
Since the C API requires ``PyObject*`` and allows to access directly structure members, PyPy has to associate a CPython object to PyPy objects and maintain both consistent. Converting a PyPy object to a CPython object is inefficient. Moreover, reference counting also has to be implemented on top of PyPy tracing garbage collector.
These conversions are required because the Python C API is too close to the CPython implementation: there is no high-level abstraction. For example, structures members are part of the public C API and nothing prevents a C extension to get or set directly ``PyTupleObject.ob_item[0]`` (the first item of a tuple).
See `Inside cpyext: Why emulating CPython C API is so Hard <https://morepypy.blogspot.com/2018/09/inside-cpyext-why-emulating-cpython-c.html>`_ (Sept 2018) by Antonio Cuni for more details.
Rationale =========
Hide implementation details ---------------------------
Hiding implementation details from the C API has multiple advantages:
* It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations. For example, tagged pointers, and replace the garbage collector with a tracing garbage collector which can move objects. * Adding new features in CPython becomes easier. * PyPy should be able to avoid conversions to CPython objects in more cases: keep efficient PyPy objects. * It becomes easier to implement the C API for a new Python implementation. * More C extensions will be compatible with Python implementations other than CPython.
Relationship with the limited C API -----------------------------------
The PEP 384 "Defining a Stable ABI" is in Python 3.4. It introduces the "limited C API": a subset of the C API. When the limited C API is used, it becomes possible to build a C extension only once and use it on multiple Python versions: that's the stable ABI.
The main limitation of the PEP 384 is that C extensions have to opt-in for the limited C API. Only very few projects made this choice, usually to ease distribution of binaries, especially on Windows.
This PEP moves the C API towards the limited C API.
Ideally, the C API will become the limited C API and all C extensions will use the stable ABI, but this is out of this PEP scope.
Specification =============
Summary -------
* (**Completed**) Reorganize the C API header files: create ``Include/cpython/`` and ``Include/internal/`` subdirectories. * (**Completed**) Move private functions exposing implementation details to the internal C API. * (**Completed**) Convert macros to static inline functions. * (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros become functions which cannot be used as l-value. * (**Completed**) New C API functions must not return borrowed references. * (**In Progress**) Provide ``pythoncapi_compat.h`` header file. * (**In Progress**) Make structures opaque, add getter and setter functions. * (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``. * (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros to static inline functions.
Reorganize the C API header files ---------------------------------
The first consumer of the C API was Python itself. There is no clear separation between APIs which must not be used outside Python, and API which are public on purpose.
Header files must be reorganized in 3 API:
* ``Include/`` directory is the limited C API: no implementation details, structures are opaque. C extensions using it get a stable ABI. * ``Include/cpython/`` directory is the CPython C API: less "portable" API, depends more on the Python version, expose some implementation details, few incompatible changes can happen. * ``Internal/internal/`` directory is the internal C API: implementation details, incompatible changes are likely at each Python release.
The creation of the ``Include/cpython/`` directory is fully backward compatible. ``Include/cpython/`` header files cannot be included directly and are included automatically by ``Include/`` header files when the ``Py_LIMITED_API`` macro is not defined.
The internal C API is installed and can be used for specific usage like debuggers and profilers which must access structures members without executing code. C extensions using the internal C API are tightly coupled to a Python version and must be recompiled at each Python version.
**STATUS**: Completed (in Python 3.8)
The reorganization of header files started in Python 3.7 and was completed in Python 3.8:
* `bpo-35134 <https://bugs.python.org/issue35134>`_: Add a new Include/cpython/ subdirectory for the "CPython API" with implementation details. * `bpo-35081 <https://bugs.python.org/issue35081>`_: Move internal headers to ``Include/internal/``
Move private functions to the internal C API --------------------------------------------
Private functions which expose implementation details must be moved to the internal C API.
If a C extension relies on a CPython private function which exposes CPython implementation details, other Python implementations have to re-implement this private function to support this C extension.
**STATUS**: Completed (in Python 3.9)
Private functions moved to the internal C API in Python 3.8:
* ``_PyObject_GC_TRACK()``, ``_PyObject_GC_UNTRACK()``
Macros and functions excluded from the limited C API in Python 3.9:
* ``_PyObject_SIZE()``, ``_PyObject_VAR_SIZE()`` * ``PyThreadState_DeleteCurrent()`` * ``PyFPE_START_PROTECT()``, ``PyFPE_END_PROTECT()`` * ``_Py_NewReference()``, ``_Py_ForgetReference()`` * ``_PyTraceMalloc_NewReference()`` * ``_Py_GetRefTotal()``
Private functions moved to the internal C API in Python 3.9:
* GC functions like ``_Py_AS_GC()``, ``_PyObject_GC_IS_TRACKED()`` and ``_PyGCHead_NEXT()`` * ``_Py_AddToAllObjects()`` (not exported) * ``_PyDebug_PrintTotalRefs()``, ``_Py_PrintReferences()``, ``_Py_PrintReferenceAddresses()`` (not exported)
Public "clear free list" functions moved to the internal C API an renamed to private functions and in Python 3.9:
* ``PyAsyncGen_ClearFreeLists()`` * ``PyContext_ClearFreeList()`` * ``PyDict_ClearFreeList()`` * ``PyFloat_ClearFreeList()`` * ``PyFrame_ClearFreeList()`` * ``PyList_ClearFreeList()`` * ``PyTuple_ClearFreeList()`` * Functions simply removed:
* ``PyMethod_ClearFreeList()`` and ``PyCFunction_ClearFreeList()``: bound method free list removed in Python 3.9. * ``PySet_ClearFreeList()``: set free list removed in Python 3.4. * ``PyUnicode_ClearFreeList()``: Unicode free list removed in Python 3.3.
Convert macros to static inline functions -----------------------------------------
Converting macros to static inline functions have multiple advantages:
* Functions have well defined parameter types and return type. * Functions can use variables with a well defined scope (the function). * Debugger can be put breakpoints on functions and profilers can display the function name in the call stacks. In most cases, it works even when a static inline function is inlined. * Functions don't have `macros pitfalls <https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html>`_.
Converting macros to static inline functions should only impact very few C extensions which use macros in unusual ways.
For backward compatibility, functions must continue to accept any type, not only ``PyObject*``, to avoid compiler warnings, since most macros cast their parameters to ``PyObject*``.
Python 3.6 requires C compilers to support static inline functions: the PEP 7 requires a subset of C99.
**STATUS**: Completed (in Python 3.9)
Macros converted to static inline functions in Python 3.8:
* ``Py_INCREF()``, ``Py_DECREF()`` * ``Py_XINCREF()``, ``Py_XDECREF()`` * ``PyObject_INIT()``, ``PyObject_INIT_VAR()`` * ``_PyObject_GC_TRACK()``, ``_PyObject_GC_UNTRACK()``, ``_Py_Dealloc()``
Macros converted to regular functions in Python 3.9:
* ``Py_EnterRecursiveCall()``, ``Py_LeaveRecursiveCall()`` (added to the limited C API) * ``PyObject_INIT()``, ``PyObject_INIT_VAR()`` * ``PyObject_GET_WEAKREFS_LISTPTR()`` * ``PyObject_CheckBuffer()`` * ``PyIndex_Check()`` * ``PyObject_IS_GC()`` * ``PyObject_NEW()`` (alias to ``PyObject_New()``), ``PyObject_NEW_VAR()`` (alias to ``PyObject_NewVar()``) * ``PyType_HasFeature()`` (always call ``PyType_GetFlags()``) * ``Py_TRASHCAN_BEGIN_CONDITION()`` and ``Py_TRASHCAN_END()`` macros now call functions which hide implementation details, rather than accessing directly members of the ``PyThreadState`` structure.
Make structures opaque ----------------------
All structures of the C API should become opaque: C extensions must use getter or setter functions to get or set structure members. For example, ``tuple->ob_item[0]`` must be replaced with ``PyTuple_GET_ITEM(tuple, 0)``.
To be able to move away from reference counting, ``PyObject`` must become opaque. Currently, the reference counter ``PyObject.ob_refcnt`` is exposed in the C API. All structures must become opaque, since they "inherit" from PyObject. For, ``PyFloatObject`` inherits from ``PyObject``::
typedef struct { PyObject ob_base; double ob_fval; } PyFloatObject;
Making ``PyObject`` fully opaque requires converting ``Py_INCREF()`` and ``Py_DECREF()`` macros to function calls. This change has an impact on performance. It is likely to be one of the very last changes when making structures opaque.
Making ``PyTypeObject`` structure opaque breaks C extensions declaring types statically (e.g. ``static PyTypeObject MyType = {...};``). C extensions must use ``PyType_FromSpec()`` to allocate types on the heap instead. Using heap types has other advantages like being compatible with subinterpreters. Combined with PEP 489 "Multi-phase extension module initialization", it makes a C extension behavior closer to a Python module, like allowing to create more than one module instance.
Making ``PyThreadState`` structure opaque requires adding getter and setter functions for members used by C extensions.
**STATUS**: In Progress (started in Python 3.8)
The ``PyInterpreterState`` structure was made opaque in Python 3.8 (`bpo-35886 <https://bugs.python.org/issue35886>`_) and the ``PyGC_Head`` structure (`bpo-40241 <https://bugs.python.org/issue40241>`_) was made opaque in Python 3.9.
Issues tracking the work to prepare the C API to make following structures opaque:
* ``PyObject``: `bpo-39573 <https://bugs.python.org/issue39573>`_ * ``PyTypeObject``: `bpo-40170 <https://bugs.python.org/issue40170>`_ * ``PyFrameObject``: `bpo-40421 <https://bugs.python.org/issue40421>`_
* Python 3.9 adds ``PyFrame_GetCode()`` and ``PyFrame_GetBack()`` getter functions, and moves ``PyFrame_GetLineNumber`` to the limited C API.
* ``PyThreadState``: `bpo-39947 <https://bugs.python.org/issue39947>`_
* Python 3.9 adds 3 getter functions: ``PyThreadState_GetFrame()``, ``PyThreadState_GetID()``, ``PyThreadState_GetInterpreter()``.
Disallow using Py_TYPE() as l-value -----------------------------------
The ``Py_TYPE()`` function gets an object type, its ``PyObject.ob_type`` member. It is implemented as a macro which can be used as an l-value to set the type: ``Py_TYPE(obj) = new_type``. This code relies on the assumption that ``PyObject.ob_type`` can be modified directly. It prevents making the ``PyObject`` structure opaque.
New setter functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()`` are added and must be used instead.
The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros must be converted to static inline functions which can not be used as l-value.
For example, the ``Py_TYPE()`` macro::
#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
becomes::
#define _PyObject_CAST_CONST(op) ((const PyObject*)(op))
static inline PyTypeObject* _Py_TYPE(const PyObject *ob) { return ob->ob_type; }
#define Py_TYPE(ob) _Py_TYPE(_PyObject_CAST_CONST(ob))
**STATUS**: Completed (in Python 3.10)
New functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()`` were added to Python 3.9.
In Python 3.10, ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` can no longer be used as l-value and the new setter functions must be used instead.
New C API functions must not return borrowed references -------------------------------------------------------
When a function returns a borrowed reference, Python cannot track when the caller stops using this reference.
For example, if the Python ``list`` type is specialized for small integers, store directly "raw" numbers rather than Python objects, ``PyList_GetItem()`` has to create a temporary Python object. The problem is to decide when it is safe to delete the temporary object.
The general guidelines is to avoid returning borrowed references for new C API functions.
No function returning borrowed functions is scheduled for removal by this PEP.
**STATUS**: Completed (in Python 3.9)
In Python 3.9, new C API functions returning Python objects only return strong references:
* ``PyFrame_GetBack()`` * ``PyFrame_GetCode()`` * ``PyObject_CallNoArgs()`` * ``PyObject_CallOneArg()`` * ``PyThreadState_GetFrame()``
Avoid functions returning PyObject** ------------------------------------
The ``PySequence_Fast_ITEMS()`` function gives a direct access to an array of ``PyObject*`` objects. The function is deprecated in favor of ``PyTuple_GetItem()`` and ``PyList_GetItem()``.
``PyTuple_GET_ITEM()`` can be abused to access directly the ``PyTupleObject.ob_item`` member::
PyObject **items = &PyTuple_GET_ITEM(0);
The ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros are converted to static inline functions to disallow that.
**STATUS**: Not Started
New pythoncapi_compat.h header file -----------------------------------
Making structures opaque requires modifying C extensions to use getter and setter functions. The practical issue is how to keep support for old Python versions which don't have these functions.
For example, in Python 3.10, it is no longer possible to use ``Py_TYPE()`` as an l-value. The new ``Py_SET_TYPE()`` function must be used instead::
#if PY_VERSION_HEX >= 0x030900A4 Py_SET_TYPE(&MyType, &PyType_Type); #else Py_TYPE(&MyType) = &PyType_Type; #endif
This code may ring a bell to developers who ported their Python code base from Python 2 to Python 3.
Python will distribute a new ``pythoncapi_compat.h`` header file which provides new C API functions to old Python versions. Example::
#if PY_VERSION_HEX < 0x030900A4 static inline void _Py_SET_TYPE(PyObject *ob, PyTypeObject *type) { ob->ob_type = type; } #define Py_SET_TYPE(ob, type) _Py_SET_TYPE((PyObject*)(ob), type) #endif // PY_VERSION_HEX < 0x030900A4
Using this header file, ``Py_SET_TYPE()`` can be used on old Python versions as well.
Developers can copy this file in their project, or even to only copy/paste the few functions needed by their C extension.
**STATUS**: In Progress (implemented but not distributed by CPython yet)
The ``pythoncapi_compat.h`` header file is currently developer at: https://github.com/pythoncapi/pythoncapi_compat
Process to reduce the number of broken C extensions ===================================================
Process to reduce the number of broken C extensions when introducing C API incompatible changes listed in this PEP:
* Estimate how many popular C extensions are affected by the incompatible change. * Coordinate with maintainers of broken C extensions to prepare their code for the future incompatible change. * Introduce the incompatible changes in Python. The documentation must explain how to port existing code. It is recommended to merge such changes at the beginning of a development cycle to have more time for tests. * Changes which are the most likely to break a large number of C extensions should be announced on the capi-sig mailing list to notify C extensions maintainers to prepare their project for the next Python. * If the change breaks too many projects, reverting the change should be discussed, taking in account the number of broken packages, their importance in the Python community, and the importance of the change.
The coordination usually means reporting issues to the projects, or even proposing changes. It does not require waiting for a new release including fixes for every broken project.
Since more and more C extensions are written using Cython, rather directly using the C API, it is important to ensure that Cython is prepared in advance for incompatible changes. It gives more time for C extension maintainers to release a new version with code generated with the updated Cython (for C extensions distributing the code generated by Cython).
Future incompatible changes can be announced by deprecating a function in the documentation and by annotating the function with ``Py_DEPRECATED()``. But making a structure opaque and preventing the usage of a macro as l-value cannot be deprecated with ``Py_DEPRECATED()``.
The important part is coordination and finding a balance between CPython evolutions and backward compatibility. For example, breaking a random, old, obscure and unmaintained C extension on PyPI is less severe than breaking numpy.
If a change is reverted, we move back to the coordination step to better prepare the change. Once more C extensions are ready, the incompatible change can be reconsidered.
Version History ===============
* Version 3, June 2020: PEP rewritten from scratch. Python now distributes a new ``pythoncapi_compat.h`` header and a process is defined to reduce the number of broken C extensions when introducing C API incompatible changes listed in this PEP. * Version 2, April 2020: `PEP: Modify the C API to hide implementation details <https://mail.python.org/archives/list/python-dev@python.org/thread/HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF/#TKHNENOXP6H34E73XGFOL2KKXSM4Z6T2>`_. * Version 1, July 2017: `PEP: Hide implementation details in the C API <https://mail.python.org/archives/list/python-ideas@python.org/thread/6XATDGWK4VBUQPRHCRLKQECTJIPBVNJQ/#HFBGCWVLSM47JEP6SO67MRFT7Y3EOC44>`_ sent to python-ideas
Copyright =========
This document has been placed in the public domain.
-- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EV7F7Z6P... Code of Conduct: http://python.org/psf/codeofconduct/
-- Software Development Engineer at Kakao corp. Tel: +82 010-3353-9127 Email: donghee.na92@gmail.com | denny.i@kakaocorp.com Linkedin: https://www.linkedin.com/in/dong-hee-na-2b713b49/
Hi Victor, Thanks for putting work into this. I support the idea of slowly evolving the C API. It must be done carefully so as to not unnecessarily break 3rd party extensions. Changes must be made for well founded reasons and not just because we think it makes a "cleaner" API. I believe you are following those principles. One aspect of the API that could be improved is memory management for PyObjects. The current API is quite a mess and for no good reason except legacy, IMHO. The original API design allowed extension types to use their own memory allocator. E.g. they could call their own malloc()/free() implemention and the rest of the CPython runtime would handle that. One consequence is that Py_DECREF() cannot call PyObject_Free() but instead has to call tp_dealloc(). There was supposed to be multiple layers of allocators, PyMem vs PyObject, but since the layering was not enforced, we ended up with a bunch of aliases to the same underlying function. Perhaps there are a few cases when the flexibility to use a custom object allocator is useful. I think in practice it is very rare than an extension needs to manage memory itself. To achieve something similar, allow a PyObject to have a reference to some externally managed resource and then the tp_del method would take care of freeing it. IMHO, the Python runtime should be in charge of allocating and freeing PyObject memory. I believe fixing this issue is not tricky, just tedious. The biggest hurdle might be dealing with statically allocated objects. IMHO, they should go away and there should only be heap allocated PyObjects (created and freed by calling CPython API functions). That change would affect most extensions, unfortunately. Another place for improvement is that the C API is unnecessarily large. E.g. we don't really need PyList_GetItem(), PyTuple_GetItem(), and PyObject_GetItem(). Every extra API is a potential leak of implementation details and a burden for alternative VMs. Maybe we should introduce something like WIN32_LEAN_AND_MEAN that hides all the extra stuff. The Py_LIMITED_API define doesn't really mean the same thing since it tries to give ABI compatibility. It would make sense to cooperate with the HPy project on deciding what parts are unnecessary. Things like Cython might still want to use the larger API, to extract every bit of performance. The vast majority of C extensions don't require that. One final comment: I think even if we manage to cleanup the API and make it friendly for other Python implementations, there is going to be a fair amount of overhead. If you look at other "managed runtimes" that just seems unavoidable (e.g. Java, CLR, V8, etc). You want to design the API so that you maximize the amount of useful work done with each API call. Using something like PyList_GET_ITEM() to iterate over a list is not a good pattern. So keep in mind that an extension API is going to have some overhead. Regards, Neil
Hi Neil, Le mar. 23 juin 2020 à 03:47, Neil Schemenauer <nas-python@arctrix.com> a écrit :
Thanks for putting work into this.
You're welcome, I took some ideas from your tagged pointer proof of concept ;-) I recall that we met the same C API issues in our experiments ;-)
Changes must be made for well founded reasons and not just because we think it makes a "cleaner" API. I believe you are following those principles.
I mostly used the tagged pointer as a concrete goal to decide which changes are required or not. PyPy and HPy developers also gave me API that they would like to see disappearing :-)
One aspect of the API that could be improved is memory management for PyObjects. The current API is quite a mess and for no good reason except legacy, IMHO. The original API design allowed extension types to use their own memory allocator. E.g. they could call their own malloc()/free() implemention and the rest of the CPython runtime would handle that. One consequence is that Py_DECREF() cannot call PyObject_Free() but instead has to call tp_dealloc(). There was supposed to be multiple layers of allocators, PyMem vs PyObject, but since the layering was not enforced, we ended up with a bunch of aliases to the same underlying function.
I vaguely recall someone explaining that Python memory allocator created high memory fragmentation, and using a dedicated memory allocator was way more efficient. But I concur that the majority of people never override default tp_new and tp_free functions. By the way, in Python 3.8, heap types started to increase their reference counter when an instance is created, but decrementing the type reference counter is the responsibility of the tp_dealloc function and we failed to find a way to automate it. More info on this issue: * https://bugs.python.org/issue35810 * https://bugs.python.org/issue40217 * https://docs.python.org/dev/whatsnew/3.9.html#changes-in-the-c-api C extensions maintainers now have to update their tp_dealloc method, or their application will never be able to destroy their heap types.
Perhaps there are a few cases when the flexibility to use a custom object allocator is useful. I think in practice it is very rare than an extension needs to manage memory itself. To achieve something similar, allow a PyObject to have a reference to some externally managed resource and then the tp_del method would take care of freeing it. IMHO, the Python runtime should be in charge of allocating and freeing PyObject memory.
Do you think that it should be in PEP 620 or can it be done independently? I don't know how to implement it, I have no idea how many C extensions would be broken, etc. I don't see an obvious relationship with interoperability with other Python implementations or the stable ABI and hiding tp_del/tp_free. While making object allocation and deallocation simpler would be nice, it doesn't seem "required" in PEP 620 for now. What do you think?
Another place for improvement is that the C API is unnecessarily large. E.g. we don't really need PyList_GetItem(), PyTuple_GetItem(), and PyObject_GetItem(). Every extra API is a potential leak of implementation details and a burden for alternative VMs. Maybe we should introduce something like WIN32_LEAN_AND_MEAN that hides all the extra stuff. The Py_LIMITED_API define doesn't really mean the same thing since it tries to give ABI compatibility. It would make sense to cooperate with the HPy project on deciding what parts are unnecessary. Things like Cython might still want to use the larger API, to extract every bit of performance. The vast majority of C extensions don't require that.
At the beginning, I had a plan to remove all functions and only keep "abstract" functions like PyObject_GetItem(). Then someone asked what is the performance overhead of only using abstract functions. I couldn't reply. Also, I didn't see a need to only use abstract functions for now, so I abandoned this idea. PyTuple_GetItem() returns a borrowed reference which is bad, whereas PyObject_GetItem() returns a strong reference. Since PyPy cpyext already solved this problem, I chose the leave the borrowed references problem aside for now. Trying to fix all issues at once doesn't work :-) One issue of calling PyTuple_GetItem() or PyDict_GetItem() is that it doesn't take in account the ability to override __getitem__() in a subclass. Few developers write the correct code like: if (PyDict_CheckExact(ns)) err = PyDict_SetItem(ns, name, v); else err = PyObject_SetItem(ns, name, v); The PEP 620 is already quite long and introduces many incompatible changes. I tried to make the PEP as short as possible and minimize the number of incompatible C API changes. Using Py_LIMITED_API provides a stable ABI, but it doesn't reduce the Python maintenance burden, and other Python implementations must continue to implement the full C API since C extensions actually use it. Unless we make Py_LIMITED_API (or another new macro to reduce the C API size), there is no benefit for CPython nor other Python implementations. Also, only a very few extensions use Py_LIMITED_API, even if it exists since Python 3.2 (released in 2011). As I wrote in the introduction, the PEP 620 is my third attempt. Previous attempts tried to keep backward compatibility and were based on an "opt-in" option (I want to use the new limited C API because the carrot looks delicious!). But IMO there is a high risk that developers don't opt-in (the carrot isn't as good as I expected :-( ) if there is little benefit in the short term, and it doesn't reduce the maintenance burden. Also, having two C API may explode the test matrix, and some people didn't like that. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Hi Victor, thanks for your continued work on improving the C-API. I'll comment on the PEP inline. Victor Stinner schrieb am 22.06.20 um 14:10:
PEP available at: https://www.python.org/dev/peps/pep-0620/ [...] Motivation ==========
The C API blocks CPython evolutions -----------------------------------
Adding or removing members of C structures is causing multiple backward compatibility issues.
Adding a new member breaks the stable ABI (PEP 384), especially for types declared statically (e.g. ``static PyTypeObject MyType = {...};``). In Python 3.4, the PEP 442 "Safe object finalization" added the ``tp_finalize`` member at the end of the ``PyTypeObject`` structure. For ABI backward compatibility, a new ``Py_TPFLAGS_HAVE_FINALIZE`` type flag was required to announce if the type structure contains the ``tp_finalize`` member. The flag was removed in Python 3.8 (`bpo-32388 <https://bugs.python.org/issue32388>`_).
Probably not the best example. I think this is pretty much normal API evolution. Changing the deallocation protocol for objects is going to impact any public API in one way or another. PyTypeObject is also not exposed with its struct fields in the limited API, so your point regarding "tp_print" is also not a strong one.
Same CPython design since 1990: structures and reference counting ----------------------------------------------------------------- Members of ``PyObject`` and ``PyTupleObject`` structures have not changed since the "Initial revision" commit (1990)
While I see an advantage in hiding the details of PyObject (specifically memory management internals), I would argue that there simply isn't much to improve in PyTupleObject, so these two don't fly at the same level for me.
Why is PyPy more efficient than CPython? ----------------------------------------
The PyPy project is a Python implementation which is 4.2x faster than CPython on average. PyPy developers chose to not fork CPython, but start from scratch to have more freedom in terms of optimization choices.
PyPy does not use reference counting, but a tracing garbage collector which moves objects. Objects can be allocated on the stack (or even not at all), rather than always having to be allocated on the heap.
Objects layouts are designed with performance in mind. For example, a list strategy stores integers directly as integers, rather than objects.
Moreover, PyPy also has a JIT compiler which emits fast code thanks to the efficient PyPy design.
I would be careful with presenting examples of PyPy optimisations here. Whichever you choose could easily give the impression that they are the most important changes that made PyPy faster and should therefore be followed in CPython. I doubt that there are any "top changes" that made the biggest difference for PyPy. Even large breakthroughs on their side stand on the shoulders of other important changes that may not have been visible by themselves in the performance graphs. CPython will not be rewritten from scratch, will continue to have its own infrastructure, and will therefore have its own specific tweaks that it will benefit from. Trying things out is fine, but there is no guarantee that following a specific change in PyPy will make a similar difference in CPython and its own ecosystem.
PyPy bottleneck: the Python C API --------------------------------- While PyPy is way more efficient than CPython to run pure Python code, it is as efficient or slower than CPython to run C extensions. [...] Hide implementation details ---------------------------
Hiding implementation details from the C API has multiple advantages:
* It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations. For example, tagged pointers, and replace the garbage collector with a tracing garbage collector which can move objects. * Adding new features in CPython becomes easier. * PyPy should be able to avoid conversions to CPython objects in more cases: keep efficient PyPy objects. * It becomes easier to implement the C API for a new Python implementation. * More C extensions will be compatible with Python implementations other than CPython.
I understand the goal of experimenting with new optimisations and larger changes internally. If, however, the goal is to make it easier for other implementations to support (existing?) C extensions, then breaking all existing C extensions in CPython first does not strike me as a good way to get there. :) My feeling is that PyPy specifically is better served with the HPy API, which is different enough to consider it a mostly separate API, or an evolution of the limited API, if you want. Suggesting that extension authors support two different APIs is much, but forcing them to support the existing CPython C-API (for legacy reasons) and the changed CPython C-API (for future compatibility), and then asking them to support a separate C-API in addition (for platform independence, with performance penalties) seems stretching it a lot. If we want to make the life easier for PyPy, I think we should support their HPy effort. Creating additional churn on CPython side for extension authors will bind a lot of efforts on that side which will then not be available for them to try out and improve HPy while it's still early enough for major design choices. In the end, I think we shouldn't try to mix the two goals of "make it easier for other Python implementations" and "make it easier to optimise CPython", at least not from the start. The best overall solution is not necessarily the best for both goals independently, nor for all three sides (CPython, PyPy, extension authors).
Specification =============
Summary -------
* (**Completed**) Reorganize the C API header files: create ``Include/cpython/`` and ``Include/internal/`` subdirectories. * (**Completed**) Move private functions exposing implementation details to the internal C API. * (**Completed**) Convert macros to static inline functions.
Perfectly reasonable steps, IMHO.
* (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros become functions which cannot be used as l-value. * (**Completed**) New C API functions must not return borrowed references. * (**In Progress**) Provide ``pythoncapi_compat.h`` header file. * (**In Progress**) Make structures opaque, add getter and setter functions. * (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``. * (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros to static inline functions.
Most of these have the potential to break code, sometimes needlessly, AFAICT. Especially the efforts to block away the internal data structures annoy me. It's obviously ok if we don't require other implementations to provide this access, but CPython has these data structures and I think it should continue to expose them.
The internal C API is installed and can be used for specific usage like debuggers and profilers which must access structures members without executing code. C extensions using the internal C API are tightly coupled to a Python version and must be recompiled at each Python version. [...] Private functions which expose implementation details must be moved to the internal C API. If a C extension relies on a CPython private function which exposes CPython implementation details, other Python implementations have to re-implement this private function to support this C extension.
If we remove CPython specific features from the (de-facto) "official public Python C-API", then I think there should be a "public CPython 3.X C-API" that actively exposes the data structures natively, not just an "internal" one. That way, extension authors can take the usual decision between performance, maintenance effort and platform independence. I think it's perfectly ok to tell authors "if you use these, you may have to adapt your code for the next CPython release, which comes in a year's time". It's not so great to give them an unqualified "don't touch these!", because that will not help their decision process.
Make structures opaque ----------------------
All structures of the C API should become opaque: C extensions must use getter or setter functions to get or set structure members. For example, ``tuple->ob_item[0]`` must be replaced with ``PyTuple_GET_ITEM(tuple, 0)``.
To be able to move away from reference counting, ``PyObject`` must become opaque.
Careful with the wording. They don't have to be completely opaque. They can still be exposed in the "public CPython 3.X C-API" for those who want to use them, just not in the "public Python C-API". Changes to the ref-counting header obviously have a large impact on existing code, but ABI breakage here should be fine as long as we keep up API compatibility. Calling PyTuple_GET_ITEM() is perfectly ok, even if it's a straight macro (or inline function) that accesses the object struct, as long as that macro still works and does something reasonable in the next CPython release. That's exactly why extension code uses a macro and not a literal "tuple->ob_item[0]". For users who want ABI compatibility (and/or platform independence), we have the stable ABI and/or the limited API.
Currently, the reference counter ``PyObject.ob_refcnt`` is exposed in the C API. All structures must become opaque, since they "inherit" from PyObject. For, ``PyFloatObject`` inherits from ``PyObject``::
typedef struct { PyObject ob_base; double ob_fval; } PyFloatObject;
Please keep PyFloat_AS_DOUBLE() and friends do what they currently do.
Making ``PyObject`` fully opaque requires converting ``Py_INCREF()`` and ``Py_DECREF()`` macros to function calls. This change has an impact on performance. It is likely to be one of the very last changes when making structures opaque.
I like the HPy approach here of essentially replacing Py_INCREF(a); b = a; with a b = Py_NewRef(a); That gives a lot more flexibility in the underlying implementation than "Py_INCREF()", while trivially translating to b = (Py_INCREF(a), a); internally for existing CPython releases. I do not see why PyObject _must_ become opaque in CPython, as long as the access goes through macros. I'd reverse the order here: First add a "Py_NewRef()" macro, then see about replacing the internal GC implementation, and if that proves useful, think about how to re-implement the ref-counting macros based on it and change them under the hood, or break compatibility at that point if necessary. I don't see how making all PyObject structs opaque for user code helps here. The macros can continue to access any internals they like, they just don't have to be the same internals across CPython releases.
Making ``PyTypeObject`` structure opaque breaks C extensions declaring types statically (e.g. ``static PyTypeObject MyType = {...};``).
Not necessarily. There was an unimplemented feature proposed in PEP-3121, the PyType_Copy() function. https://www.python.org/dev/peps/pep-3121/#specification PyTypeObject does not have to be opaque. But it also doesn't have to be the same thing for defining and for using types. You could still define a type with a PyTypeObject struct and then copy it over into a heap type or other internal type structure from there. Whether that's better than using PyType_FromSpec(), maybe not, but at least it doesn't mean we have to break existing code that uses static extension type definitions.
Disallow using Py_TYPE() as l-value -----------------------------------
The ``Py_TYPE()`` function gets an object type, its ``PyObject.ob_type`` member. It is implemented as a macro which can be used as an l-value to set the type: ``Py_TYPE(obj) = new_type``. This code relies on the assumption that ``PyObject.ob_type`` can be modified directly. It prevents making the ``PyObject`` structure opaque.
New setter functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()`` are added and must be used instead.
Totally reasonable. In general, we should always allow C-API users to avoid hacks and write explicit code. That makes it easier for CPython to change its API implementation without changing the API. The above usages are very rare anyway and can just be part of the CPython-specfic API. Adapting the usages to other Python implementations is probably trivial as long as they provide similar features in some way. Speaking of which, "Py_SET_REFCNT()" is probably less explicit than "Py_INC_REFCNT()" and "Py_DEC_REFCNT()" macros would be, but the latter two seem more likely to a) fit the usual (or only?) use cases and b) be easily supportable by other Python implementations. I haven't come across a use case yet where I had to change a ref-count by more than 1, but allowing users to arbitrarily do that may require way more infrastructure under the hood than allowing them to create or remove a single reference to an object. I think explicit is really better than implicit here. The same does not seem to apply to "Py_SET_TYPE()" and "Py_SET_SIZE()", since any object or (applicable) container implementation would normally have to know its type and size, regardless of any implementation details.
Avoid functions returning PyObject** ------------------------------------
The ``PySequence_Fast_ITEMS()`` function gives a direct access to an array of ``PyObject*`` objects. The function is deprecated in favor of ``PyTuple_GetItem()`` and ``PyList_GetItem()``.
``PyTuple_GET_ITEM()`` can be abused to access directly the ``PyTupleObject.ob_item`` member::
PyObject **items = &PyTuple_GET_ITEM(0);
The ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros are converted to static inline functions to disallow that.
The same as I said above applies: CPython has these data structures. It should officially expose them, even if it does not guarantee them across minor releases. I would also ask that the usage PyObject **items = &PyTuple_GET_ITEM(0); should be replaced by an offical macro, e.g. "PyTuple_ITEMS()" (because the private version "_PyTuple_ITEMS()" of that already exists). This is yet another of those hacks where explicit would have been better than implicit. BTW, I think it is much less likely that the internal implementation of tuples changes than that of lists (which can benefit from native data type arrays à la "array.array()"). We should not take decisions about PyTuple based on arguments for PyList.
New pythoncapi_compat.h header file -----------------------------------
Making structures opaque requires modifying C extensions to use getter and setter functions. The practical issue is how to keep support for old Python versions which don't have these functions.
For example, in Python 3.10, it is no longer possible to use ``Py_TYPE()`` as an l-value. The new ``Py_SET_TYPE()`` function must be used instead::
#if PY_VERSION_HEX >= 0x030900A4 Py_SET_TYPE(&MyType, &PyType_Type); #else Py_TYPE(&MyType) = &PyType_Type; #endif
This code may ring a bell to developers who ported their Python code base from Python 2 to Python 3.
Python will distribute a new ``pythoncapi_compat.h`` header file which provides new C API functions to old Python versions. Example::
#if PY_VERSION_HEX < 0x030900A4 static inline void _Py_SET_TYPE(PyObject *ob, PyTypeObject *type) { ob->ob_type = type; } #define Py_SET_TYPE(ob, type) _Py_SET_TYPE((PyObject*)(ob), type) #endif // PY_VERSION_HEX < 0x030900A4
Using this header file, ``Py_SET_TYPE()`` can be used on old Python versions as well.
Developers can copy this file in their project, or even to only copy/paste the few functions needed by their C extension.
Yes, I think this is a good way to handle this. It keeps the final control over the implementation in CPython and gives a lot of freedom to extension developers.
Process to reduce the number of broken C extensions ===================================================
Process to reduce the number of broken C extensions when introducing C API incompatible changes listed in this PEP:
* Estimate how many popular C extensions are affected by the incompatible change. * Coordinate with maintainers of broken C extensions to prepare their code for the future incompatible change. * Introduce the incompatible changes in Python. The documentation must explain how to port existing code. It is recommended to merge such changes at the beginning of a development cycle to have more time for tests. * Changes which are the most likely to break a large number of C extensions should be announced on the capi-sig mailing list to notify C extensions maintainers to prepare their project for the next Python. * If the change breaks too many projects, reverting the change should be discussed, taking in account the number of broken packages, their importance in the Python community, and the importance of the change.
The coordination usually means reporting issues to the projects, or even proposing changes. It does not require waiting for a new release including fixes for every broken project.
Quite some effort, but yes, +1. This is a very fair way to communicate between both sides.
Since more and more C extensions are written using Cython, rather directly using the C API, it is important to ensure that Cython is prepared in advance for incompatible changes. It gives more time for C extension maintainers to release a new version with code generated with the updated Cython (for C extensions distributing the code generated by Cython).
Thank you! :) Cython isn't the only such tool, though. PyBind11 and a few others are probably also worth keeping in the loop. I think this can also be handled through the capi-sig mailing list. Any such project should naturally be interested in changes to the C-API and discussions about its evolution.
The important part is coordination and finding a balance between CPython evolutions and backward compatibility. For example, breaking a random, old, obscure and unmaintained C extension on PyPI is less severe than breaking numpy.
This sounds like a common CI testing infrastructure would help all sides. Currently, we have something like that mostly working by having different projects integrate with each other's master branch, e.g. Pandas, NumPy, Cython, and notifying each other of detected breakages. It's mostly every project setting up its own CI on travis&Co here, so a bit of duplicated work on all sides. Not sure if that's inherently bad, but there's definitely some room for generalisation and improvements. Again, thanks Victor for pushing these efforts. Even if me and others are giving you a hard time getting your proposals accepted, I appreciate the work that you put into improving the ecosystem(s). Stefan
On Tue, Jun 23, 2020 at 11:33 AM Victor Stinner <vstinner@python.org> wrote:
Hi Neil,
Le mar. 23 juin 2020 à 03:47, Neil Schemenauer
One aspect of the API that could be improved is memory management for PyObjects. The current API is quite a mess and for no good reason except legacy, IMHO. The original API design allowed extension types to use their own memory allocator. E.g. they could call their own malloc()/free() implemention and the rest of the CPython runtime would handle that. One consequence is that Py_DECREF() cannot call PyObject_Free() but instead has to call tp_dealloc(). There was supposed to be multiple layers of allocators, PyMem vs PyObject, but since the layering was not enforced, we ended up with a bunch of aliases to the same underlying function.
I vaguely recall someone explaining that Python memory allocator created high memory fragmentation, and using a dedicated memory allocator was way more efficient. But I concur that the majority of people never override default tp_new and tp_free functions.
Not so much Python's memory allocator (it does better than most), but just plain malloc. However, the answer in these cases isn't to replace the allocator for a few extension types, since that wouldn't affect any of Python's own allocations. The better answer is to replace malloc altogether. At Google we use tcmalloc for everything, by linking it into the binaries we build. However, the effect on Python's allocations isn't very big (but it's still measurable) because obmalloc does a pretty good job; we do it more for the C/C++ libraries we end up wrapping, where it can matter *a lot*. I don't think we ever set tp_new/tp_free to anything other than the defaults, and we could certainly live with it going away. We also experimented with disabling obmalloc when using tcmalloc, but obmalloc still does measurably better than tcmalloc. There's another reason not to have different allocators, at least not ones that don't trickle down to 'malloc': AddressSanitizer and ThreadSanitizer rely on intercepting all allocations, and they are *very* useful tools for any C/C++ codebase. They don't (at the moment) particularly benefit Python code, but they certainly do benefit CPython extensions and the C/C++ libraries they wrap. I think the ability for per-type allocation/deallocation routines isn't really about efficiency, but more about giving more control to embedding systems (or libraries wrapped by extension modules) about how *their* objects are allocated. It doesn't make much sense, however, because Python wouldn't allocate their objects anyway, just the Python objects wrapping theirs. Allocating CPython objects should be CPython's job. FWIW, I suspect the biggest problem with getting rid of tp_new/tp_free is code that does *more* than just allocate in those functions, only because the authors didn't realise they should be doing it in tp_alloc/tp_dealloc instead. -- Thomas Wouters <thomas@python.org> Hi! I'm an email virus! Think twice before sending your email to help me spread!
On 2020-06-22 14:10, Victor Stinner wrote:
Hi,
PEP available at: https://www.python.org/dev/peps/pep-0620/
<introduction> This PEP is the result of 4 years of research work on the C API: https://pythoncapi.readthedocs.io/
It's the third version. The first version (2017) proposed to add a "new C API" and advised C extensions maintainers to opt-in for it: it was basically the same idea as PEP 384 limited C API but in a different color. Well, I had no idea of what I was doing :-) The second version (April 2020) proposed to add a new Python runtime built from the same code base as the regular Python runtime but in a different build mode, the regular Python would continue to be fully compatible.
I wrote the third version, the PEP 620, from scratch. It now gives an explicit and concrete list of incompatible C API changes, and has better motivation and rationale sections. The main PEP novelty is the new pythoncapi_compat.h header file distributed with Python to provide new C API functions to old Python versions, the second novelty is the process to reduce the number of broken C extensions.
Whereas PEPs are usually implemented in a single Python version, the implementation of this PEP is expected to be done carefully over multiple Python versions. The PEP lists many changes which are already implemented in Python 3.7, 3.8 and 3.9. It defines a process to reduce the number of broken C extensions when introducing the incompatible C API changes listed in the PEP. The process dictates the rhythm of these changes. </introduction>
PEP: 620 Title: Hide implementation details from the C API Author: Victor Stinner <vstinner@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 19-June-2020 Python-Version: 3.10
Abstract ========
Introduce C API incompatible changes to hide implementation details.
Once most implementation details will be hidden, evolution of CPython internals would be less limited by C API backward compatibility issues. It will be way easier to add new features.
It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations, like tagged pointers.
Define a process to reduce the number of broken C extensions.
The implementation of this PEP is expected to be done carefully over multiple Python versions. It already started in Python 3.7 and most changes are already completed. The `Process to reduce the number of broken C extensions`_ dictates the rhythm.
Motivation ==========
The C API blocks CPython evolutions -----------------------------------
Adding or removing members of C structures is causing multiple backward compatibility issues.
Adding a new member breaks the stable ABI (PEP 384), especially for types declared statically (e.g. ``static PyTypeObject MyType = {...};``).
PyTypeObject is explicitly not part of the stable ABI, see PEP 384: https://www.python.org/dev/peps/pep-0384/#structures I don't know why Py_TPFLAGS_HAVE_FINALIZE was added, but it wasn't for the PEP 384 stable ABI. Can you find a different example, so users are not misled?
In Python 3.4, the PEP 442 "Safe object finalization" added the ``tp_finalize`` member at the end of the ``PyTypeObject`` structure. For ABI backward compatibility, a new ``Py_TPFLAGS_HAVE_FINALIZE`` type flag was required to announce if the type structure contains the ``tp_finalize`` member. The flag was removed in Python 3.8 (`bpo-32388 <https://bugs.python.org/issue32388>`_).
The ``PyTypeObject.tp_print`` member, deprecated since Python 3.0 released in 2009, has been removed in the Python 3.8 development cycle. But the change broke too many C extensions and had to be reverted before 3.8 final release. Finally, the member was removed again in Python 3.9.
C extensions rely on the ability to access directly structure members, indirectly through the C API, or even directly.
I think you want to remove a "directly" from that sentence.
Modifying structures like ``PyListObject`` cannot be even considered.
The ``PyTypeObject`` structure is the one which evolved the most, simply because there was no other way to evolve CPython than modifying it.
In the C API, all Python objects are passed as ``PyObject*``: a pointer to a ``PyObject`` structure. Experimenting tagged pointers in CPython is blocked by the fact that a C extension can technically dereference a ``PyObject*`` pointer and access ``PyObject`` members. Small "objects" can be stored as a tagged pointer with no concrete ``PyObject`` structure.
I think this would be confusing to people who don't already know what you mean. May I suggest: A C extension can technically dereference a ``PyObject*`` pointer and access ``PyObject`` members. This prevents experiments like tagged pointers (storing small values as ``PyObject*`` which does not point to a valid ``PyObject`` structure).
Replacing Python garbage collector with a tracing garbage collector would also need to remove ``PyObject.ob_refcnt`` reference counter, whereas currently ``Py_INCREF()`` and ``Py_DECREF()`` macros access directly to ``PyObject.ob_refcnt``.
Same CPython design since 1990: structures and reference counting -----------------------------------------------------------------
When the CPython project was created, it was written with one principle: keep the implementation simple enough so it can be maintained by a single developer. CPython complexity grew a lot and many micro-optimizations have been implemented, but CPython core design has not changed.
Members of ``PyObject`` and ``PyTupleObject`` structures have not changed since the "Initial revision" commit (1990)::
#define OB_HEAD \ unsigned int ob_refcnt; \ struct _typeobject *ob_type;
typedef struct _object { OB_HEAD } object;
typedef struct { OB_VARHEAD object *ob_item[1]; } tupleobject;
Only names changed: ``object`` was renamed to ``PyObject`` and ``tupleobject`` was renamed to ``PyTupleObject``.
CPython still tracks Python objects lifetime using reference counting internally and for third party C extensions (through the Python C API).
All Python objects must be allocated on the heap and cannot be moved.
Why is PyPy more efficient than CPython? ----------------------------------------
The PyPy project is a Python implementation which is 4.2x faster than CPython on average. PyPy developers chose to not fork CPython, but start from scratch to have more freedom in terms of optimization choices.
PyPy does not use reference counting, but a tracing garbage collector which moves objects. Objects can be allocated on the stack (or even not at all), rather than always having to be allocated on the heap.
Objects layouts are designed with performance in mind. For example, a list strategy stores integers directly as integers, rather than objects.
Moreover, PyPy also has a JIT compiler which emits fast code thanks to the efficient PyPy design.
PyPy bottleneck: the Python C API ---------------------------------
While PyPy is way more efficient than CPython to run pure Python code, it is as efficient or slower than CPython to run C extensions.
Since the C API requires ``PyObject*`` and allows to access directly structure members, PyPy has to associate a CPython object to PyPy objects and maintain both consistent. Converting a PyPy object to a CPython object is inefficient. Moreover, reference counting also has to be implemented on top of PyPy tracing garbage collector.
These conversions are required because the Python C API is too close to the CPython implementation: there is no high-level abstraction. For example, structures members are part of the public C API and nothing prevents a C extension to get or set directly ``PyTupleObject.ob_item[0]`` (the first item of a tuple).
See `Inside cpyext: Why emulating CPython C API is so Hard <https://morepypy.blogspot.com/2018/09/inside-cpyext-why-emulating-cpython-c.html>`_ (Sept 2018) by Antonio Cuni for more details.
Rationale =========
Hide implementation details ---------------------------
Hiding implementation details from the C API has multiple advantages:
* It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations. For example, tagged pointers, and replace the garbage collector with a tracing garbage collector which can move objects. * Adding new features in CPython becomes easier. * PyPy should be able to avoid conversions to CPython objects in more cases: keep efficient PyPy objects. * It becomes easier to implement the C API for a new Python implementation. * More C extensions will be compatible with Python implementations other than CPython.
Relationship with the limited C API -----------------------------------
The PEP 384 "Defining a Stable ABI" is in Python 3.4. It introduces the "limited C API": a subset of the C API. When the limited C API is used, it becomes possible to build a C extension only once and use it on multiple Python versions: that's the stable ABI.
The main limitation of the PEP 384 is that C extensions have to opt-in for the limited C API. Only very few projects made this choice, usually to ease distribution of binaries, especially on Windows.
This PEP moves the C API towards the limited C API.
Ideally, the C API will become the limited C API and all C extensions will use the stable ABI, but this is out of this PEP scope.
And I would prefer to move the limited API closer to the regular C API. So we do agree on the goal :)
Specification =============
Summary -------
* (**Completed**) Reorganize the C API header files: create ``Include/cpython/`` and ``Include/internal/`` subdirectories. * (**Completed**) Move private functions exposing implementation details to the internal C API. * (**Completed**) Convert macros to static inline functions. * (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros become functions which cannot be used as l-value. * (**Completed**) New C API functions must not return borrowed references. * (**In Progress**) Provide ``pythoncapi_compat.h`` header file. * (**In Progress**) Make structures opaque, add getter and setter functions. * (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``. * (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros to static inline functions.
What does **Completed** mean? All work in that category is done, and any new changes in the category will require a new PEP (or a change to this PEP)? It seems that this PEP is largely asking for forgiveness, so trying to find better ways to solve the problems would be a waste of time at this point.
Reorganize the C API header files ---------------------------------
The first consumer of the C API was Python itself. There is no clear separation between APIs which must not be used outside Python, and API which are public on purpose.
Header files must be reorganized in 3 API:
* ``Include/`` directory is the limited C API: no implementation details, structures are opaque. C extensions using it get a stable ABI. * ``Include/cpython/`` directory is the CPython C API: less "portable" API, depends more on the Python version, expose some implementation details, few incompatible changes can happen. * ``Internal/internal/`` directory is the internal C API: implementation details, incompatible changes are likely at each Python release.
The creation of the ``Include/cpython/`` directory is fully backward compatible. ``Include/cpython/`` header files cannot be included directly and are included automatically by ``Include/`` header files when the ``Py_LIMITED_API`` macro is not defined.
The internal C API is installed and can be used for specific usage like debuggers and profilers which must access structures members without executing code. C extensions using the internal C API are tightly coupled to a Python version and must be recompiled at each Python version.
**STATUS**: Completed (in Python 3.8)
The reorganization of header files started in Python 3.7 and was completed in Python 3.8:
* `bpo-35134 <https://bugs.python.org/issue35134>`_: Add a new Include/cpython/ subdirectory for the "CPython API" with implementation details. * `bpo-35081 <https://bugs.python.org/issue35081>`_: Move internal headers to ``Include/internal/``
Move private functions to the internal C API --------------------------------------------
Private functions which expose implementation details must be moved to the internal C API.
If a C extension relies on a CPython private function which exposes CPython implementation details, other Python implementations have to re-implement this private function to support this C extension.
**STATUS**: Completed (in Python 3.9)
Private functions moved to the internal C API in Python 3.8:
* ``_PyObject_GC_TRACK()``, ``_PyObject_GC_UNTRACK()``
Macros and functions excluded from the limited C API in Python 3.9:
* ``_PyObject_SIZE()``, ``_PyObject_VAR_SIZE()`` * ``PyThreadState_DeleteCurrent()`` * ``PyFPE_START_PROTECT()``, ``PyFPE_END_PROTECT()`` * ``_Py_NewReference()``, ``_Py_ForgetReference()`` * ``_PyTraceMalloc_NewReference()`` * ``_Py_GetRefTotal()``
Private functions moved to the internal C API in Python 3.9:
* GC functions like ``_Py_AS_GC()``, ``_PyObject_GC_IS_TRACKED()`` and ``_PyGCHead_NEXT()`` * ``_Py_AddToAllObjects()`` (not exported) * ``_PyDebug_PrintTotalRefs()``, ``_Py_PrintReferences()``, ``_Py_PrintReferenceAddresses()`` (not exported)
Public "clear free list" functions moved to the internal C API an renamed to private functions and in Python 3.9:
* ``PyAsyncGen_ClearFreeLists()`` * ``PyContext_ClearFreeList()`` * ``PyDict_ClearFreeList()`` * ``PyFloat_ClearFreeList()`` * ``PyFrame_ClearFreeList()`` * ``PyList_ClearFreeList()`` * ``PyTuple_ClearFreeList()`` * Functions simply removed:
* ``PyMethod_ClearFreeList()`` and ``PyCFunction_ClearFreeList()``: bound method free list removed in Python 3.9. * ``PySet_ClearFreeList()``: set free list removed in Python 3.4. * ``PyUnicode_ClearFreeList()``: Unicode free list removed in Python 3.3.
Convert macros to static inline functions -----------------------------------------
Converting macros to static inline functions have multiple advantages:
* Functions have well defined parameter types and return type. * Functions can use variables with a well defined scope (the function). * Debugger can be put breakpoints on functions and profilers can display the function name in the call stacks. In most cases, it works even when a static inline function is inlined. * Functions don't have `macros pitfalls <https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html>`_.
Converting macros to static inline functions should only impact very few C extensions which use macros in unusual ways.
For backward compatibility, functions must continue to accept any type, not only ``PyObject*``, to avoid compiler warnings, since most macros cast their parameters to ``PyObject*``.
Python 3.6 requires C compilers to support static inline functions: the PEP 7 requires a subset of C99.
**STATUS**: Completed (in Python 3.9)
Macros converted to static inline functions in Python 3.8:
* ``Py_INCREF()``, ``Py_DECREF()`` * ``Py_XINCREF()``, ``Py_XDECREF()`` * ``PyObject_INIT()``, ``PyObject_INIT_VAR()`` * ``_PyObject_GC_TRACK()``, ``_PyObject_GC_UNTRACK()``, ``_Py_Dealloc()``
Macros converted to regular functions in Python 3.9:
* ``Py_EnterRecursiveCall()``, ``Py_LeaveRecursiveCall()`` (added to the limited C API) * ``PyObject_INIT()``, ``PyObject_INIT_VAR()`` * ``PyObject_GET_WEAKREFS_LISTPTR()`` * ``PyObject_CheckBuffer()`` * ``PyIndex_Check()`` * ``PyObject_IS_GC()`` * ``PyObject_NEW()`` (alias to ``PyObject_New()``), ``PyObject_NEW_VAR()`` (alias to ``PyObject_NewVar()``) * ``PyType_HasFeature()`` (always call ``PyType_GetFlags()``) * ``Py_TRASHCAN_BEGIN_CONDITION()`` and ``Py_TRASHCAN_END()`` macros now call functions which hide implementation details, rather than accessing directly members of the ``PyThreadState`` structure.
Make structures opaque ----------------------
All structures of the C API should become opaque: C extensions must use getter or setter functions to get or set structure members. For example, ``tuple->ob_item[0]`` must be replaced with ``PyTuple_GET_ITEM(tuple, 0)``.
All structures? I don't think we can make PyModuleDef opaque, for example. PEP 384 lists more structures, some of which I don't think should be opaque: https://www.python.org/dev/peps/pep-0384/#structures
To be able to move away from reference counting, ``PyObject`` must become opaque. Currently, the reference counter ``PyObject.ob_refcnt`` is exposed in the C API. All structures must become opaque, since they "inherit" from PyObject. For, ``PyFloatObject`` inherits from ``PyObject``::
typedef struct { PyObject ob_base; double ob_fval; } PyFloatObject;
Making ``PyObject`` fully opaque requires converting ``Py_INCREF()`` and ``Py_DECREF()`` macros to function calls. This change has an impact on performance. It is likely to be one of the very last changes when making structures opaque.
Making ``PyTypeObject`` structure opaque breaks C extensions declaring types statically (e.g. ``static PyTypeObject MyType = {...};``). C extensions must use ``PyType_FromSpec()`` to allocate types on the heap instead. Using heap types has other advantages like being compatible with subinterpreters. Combined with PEP 489 "Multi-phase extension module initialization", it makes a C extension behavior closer to a Python module, like allowing to create more than one module instance.
Making ``PyThreadState`` structure opaque requires adding getter and setter functions for members used by C extensions.
**STATUS**: In Progress (started in Python 3.8)
The ``PyInterpreterState`` structure was made opaque in Python 3.8 (`bpo-35886 <https://bugs.python.org/issue35886>`_) and the ``PyGC_Head`` structure (`bpo-40241 <https://bugs.python.org/issue40241>`_) was made opaque in Python 3.9.
Issues tracking the work to prepare the C API to make following structures opaque:
* ``PyObject``: `bpo-39573 <https://bugs.python.org/issue39573>`_ * ``PyTypeObject``: `bpo-40170 <https://bugs.python.org/issue40170>`_ * ``PyFrameObject``: `bpo-40421 <https://bugs.python.org/issue40421>`_
* Python 3.9 adds ``PyFrame_GetCode()`` and ``PyFrame_GetBack()`` getter functions, and moves ``PyFrame_GetLineNumber`` to the limited C API.
* ``PyThreadState``: `bpo-39947 <https://bugs.python.org/issue39947>`_
* Python 3.9 adds 3 getter functions: ``PyThreadState_GetFrame()``, ``PyThreadState_GetID()``, ``PyThreadState_GetInterpreter()``.
Disallow using Py_TYPE() as l-value -----------------------------------
The ``Py_TYPE()`` function gets an object type, its ``PyObject.ob_type`` member. It is implemented as a macro which can be used as an l-value to set the type: ``Py_TYPE(obj) = new_type``. This code relies on the assumption that ``PyObject.ob_type`` can be modified directly. It prevents making the ``PyObject`` structure opaque.
New setter functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()`` are added and must be used instead.
The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros must be converted to static inline functions which can not be used as l-value.
For example, the ``Py_TYPE()`` macro::
#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
becomes::
#define _PyObject_CAST_CONST(op) ((const PyObject*)(op))
static inline PyTypeObject* _Py_TYPE(const PyObject *ob) { return ob->ob_type; }
#define Py_TYPE(ob) _Py_TYPE(_PyObject_CAST_CONST(ob))
**STATUS**: Completed (in Python 3.10)
New functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()`` were added to Python 3.9.
In Python 3.10, ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` can no longer be used as l-value and the new setter functions must be used instead.
New C API functions must not return borrowed references -------------------------------------------------------
When a function returns a borrowed reference, Python cannot track when the caller stops using this reference.
For example, if the Python ``list`` type is specialized for small integers, store directly "raw" numbers rather than Python objects, ``PyList_GetItem()`` has to create a temporary Python object. The problem is to decide when it is safe to delete the temporary object.
The general guidelines is to avoid returning borrowed references for new C API functions.
No function returning borrowed functions is scheduled for removal by this PEP.
**STATUS**: Completed (in Python 3.9)
In Python 3.9, new C API functions returning Python objects only return strong references:
* ``PyFrame_GetBack()`` * ``PyFrame_GetCode()`` * ``PyObject_CallNoArgs()`` * ``PyObject_CallOneArg()`` * ``PyThreadState_GetFrame()``
Avoid functions returning PyObject** ------------------------------------
The ``PySequence_Fast_ITEMS()`` function gives a direct access to an array of ``PyObject*`` objects. The function is deprecated in favor of ``PyTuple_GetItem()`` and ``PyList_GetItem()``.
``PyTuple_GET_ITEM()`` can be abused to access directly the ``PyTupleObject.ob_item`` member::
PyObject **items = &PyTuple_GET_ITEM(0);
The ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros are converted to static inline functions to disallow that.
**STATUS**: Not Started
New pythoncapi_compat.h header file -----------------------------------
Making structures opaque requires modifying C extensions to use getter and setter functions. The practical issue is how to keep support for old Python versions which don't have these functions.
For example, in Python 3.10, it is no longer possible to use ``Py_TYPE()`` as an l-value. The new ``Py_SET_TYPE()`` function must be used instead::
#if PY_VERSION_HEX >= 0x030900A4 Py_SET_TYPE(&MyType, &PyType_Type); #else Py_TYPE(&MyType) = &PyType_Type; #endif
This code may ring a bell to developers who ported their Python code base from Python 2 to Python 3.
Python will distribute a new ``pythoncapi_compat.h`` header file which provides new C API functions to old Python versions. Example::
#if PY_VERSION_HEX < 0x030900A4 static inline void _Py_SET_TYPE(PyObject *ob, PyTypeObject *type) { ob->ob_type = type; } #define Py_SET_TYPE(ob, type) _Py_SET_TYPE((PyObject*)(ob), type) #endif // PY_VERSION_HEX < 0x030900A4
Using this header file, ``Py_SET_TYPE()`` can be used on old Python versions as well.
Developers can copy this file in their project, or even to only copy/paste the few functions needed by their C extension.
**STATUS**: In Progress (implemented but not distributed by CPython yet)
The ``pythoncapi_compat.h`` header file is currently developer at: https://github.com/pythoncapi/pythoncapi_compat
Process to reduce the number of broken C extensions ===================================================
Process to reduce the number of broken C extensions when introducing C API incompatible changes listed in this PEP:
* Estimate how many popular C extensions are affected by the incompatible change. * Coordinate with maintainers of broken C extensions to prepare their code for the future incompatible change. * Introduce the incompatible changes in Python. The documentation must explain how to port existing code. It is recommended to merge such changes at the beginning of a development cycle to have more time for tests. * Changes which are the most likely to break a large number of C extensions should be announced on the capi-sig mailing list to notify C extensions maintainers to prepare their project for the next Python. * If the change breaks too many projects, reverting the change should be discussed, taking in account the number of broken packages, their importance in the Python community, and the importance of the change.
What is "popular"? What is "too many"? Who decides?
The coordination usually means reporting issues to the projects, or even proposing changes. It does not require waiting for a new release including fixes for every broken project.
Since more and more C extensions are written using Cython, rather directly using the C API, it is important to ensure that Cython is prepared in advance for incompatible changes. It gives more time for C extension maintainers to release a new version with code generated with the updated Cython (for C extensions distributing the code generated by Cython).
Future incompatible changes can be announced by deprecating a function in the documentation and by annotating the function with ``Py_DEPRECATED()``.
How long should functions be deprecated before being removed?
But making a structure opaque and preventing the usage of a macro as l-value cannot be deprecated with ``Py_DEPRECATED()``.
So, how should things like this be deprecated? And for how long?
The important part is coordination and finding a balance between CPython evolutions and backward compatibility. For example, breaking a random, old, obscure and unmaintained C extension on PyPI is less severe than breaking numpy.
How will this balance be found?
If a change is reverted, we move back to the coordination step to better prepare the change. Once more C extensions are ready, the incompatible change can be reconsidered.
When in the process should it be reverted?
Version History ===============
* Version 3, June 2020: PEP rewritten from scratch. Python now distributes a new ``pythoncapi_compat.h`` header and a process is defined to reduce the number of broken C extensions when introducing C API incompatible changes listed in this PEP. * Version 2, April 2020: `PEP: Modify the C API to hide implementation details <https://mail.python.org/archives/list/python-dev@python.org/thread/HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF/#TKHNENOXP6H34E73XGFOL2KKXSM4Z6T2>`_. * Version 1, July 2017: `PEP: Hide implementation details in the C API <https://mail.python.org/archives/list/python-ideas@python.org/thread/6XATDGWK4VBUQPRHCRLKQECTJIPBVNJQ/#HFBGCWVLSM47JEP6SO67MRFT7Y3EOC44>`_ sent to python-ideas
Copyright =========
This document has been placed in the public domain.
Le mar. 23 juin 2020 à 15:56, Petr Viktorin <encukou@gmail.com> a écrit :
Adding or removing members of C structures is causing multiple backward compatibility issues.
Adding a new member breaks the stable ABI (PEP 384), especially for types declared statically (e.g. ``static PyTypeObject MyType = {...};``).
PyTypeObject is explicitly not part of the stable ABI, see PEP 384: https://www.python.org/dev/peps/pep-0384/#structures I don't know why Py_TPFLAGS_HAVE_FINALIZE was added, but it wasn't for the PEP 384 stable ABI.
Maybe Antoine Pitrou knows the rationale why Py_TPFLAGS_HAVE_FINALIZE flag was added. Removing the flag was discussed at: * https://bugs.python.org/issue32388 * https://mail.python.org/pipermail/python-dev/2017-December/151328.html
Summary -------
* (**Completed**) Reorganize the C API header files: create ``Include/cpython/`` and ``Include/internal/`` subdirectories. * (**Completed**) Move private functions exposing implementation details to the internal C API. * (**Completed**) Convert macros to static inline functions. * (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros become functions which cannot be used as l-value. * (**Completed**) New C API functions must not return borrowed references. * (**In Progress**) Provide ``pythoncapi_compat.h`` header file. * (**In Progress**) Make structures opaque, add getter and setter functions. * (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``. * (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros to static inline functions.
What does **Completed** mean? All work in that category is done, and any new changes in the category will require a new PEP (or a change to this PEP)?
The purpose of the PEP is to show an overview of the work already done towards the same goal. Changes were scattered in multiple issues and so I don't think that it was easy to follow it. The PEP cannot be exhaustive, hiding implementation details is a work-in-progress. Usually, once a PEP is approved, it should not be modified. I don't think that minor incompatible changes should require a new whole PEP. The purpose of the PEP is to define an overall process for past and future changes in this area, and also announces our willingness for this (hide implementation details). If tomorrow, an incompatible C API change becomes very controversial, maybe another PEP will be justified. So far, incompatible C API changes were done with no PEP. I decided to write a PEP for the Py_TYPE() change: disallow to use it as an l-value, that's one of the "most" incompatible changes since I started to work on this.
It seems that this PEP is largely asking for forgiveness, so trying to find better ways to solve the problems would be a waste of time at this point.
Most completed changes were already discussed during physical meetings, like Pycons and Core dev sprints. From what I recall, everybody (involved in the discussion) agreed on the solution. But there is always room for enhancements! The purpose of the PEP is also to make the work already done more visible, so it's easier to reason about the overall picture. Feel free to propose your ideas, it's exactly the role of a discussion on a PEP!
Make structures opaque ----------------------
All structures of the C API should become opaque: C extensions must use getter or setter functions to get or set structure members. For example, ``tuple->ob_item[0]`` must be replaced with ``PyTuple_GET_ITEM(tuple, 0)``.
All structures? I don't think we can make PyModuleDef opaque, for example. PEP 384 lists more structures, some of which I don't think should be opaque: https://www.python.org/dev/peps/pep-0384/#structures
Oh, no, PyModuleDef is fine. I would like to make the following structures opaque: * PyInterpreterState * PyThreadState * PyGC_Head * PyTypeObject * PyObject and PyVarObject * PyTypeObject * All types which inherit from PyObject or PyVarObject But I'm not sure about the exact list. Let's start with this list :-)
Process to reduce the number of broken C extensions ===================================================
Process to reduce the number of broken C extensions when introducing C API incompatible changes listed in this PEP:
* Estimate how many popular C extensions are affected by the incompatible change. * Coordinate with maintainers of broken C extensions to prepare their code for the future incompatible change. * Introduce the incompatible changes in Python. The documentation must explain how to port existing code. It is recommended to merge such changes at the beginning of a development cycle to have more time for tests. * Changes which are the most likely to break a large number of C extensions should be announced on the capi-sig mailing list to notify C extensions maintainers to prepare their project for the next Python. * If the change breaks too many projects, reverting the change should be discussed, taking in account the number of broken packages, their importance in the Python community, and the importance of the change.
What is "popular"? What is "too many"? Who decides?
The process is underspecified on purpose. I don't know that it's possible to define strict limits. For me the important part is discussing incompatible changes and trying to reach a consensus.
Future incompatible changes can be announced by deprecating a function in the documentation and by annotating the function with ``Py_DEPRECATED()``.
How long should functions be deprecated before being removed?
In the whole PEP 620, there is a single function which is deprecated: PySequence_Fast_ITEMS(). Note: It seems like Brett and Benjamin wants to require to have a deprecation for 2 releases before being able to remove a function: https://discuss.python.org/t/pep-387-backwards-compatibilty-policy/4421 The PEP doesn't plan PySequence_Fast_ITEMS() removal, but I guess that it should follow PEP 387. If it's deprecated in 3.10, its removal will happen in Python 3.12.
But making a structure opaque and preventing the usage of a macro as l-value cannot be deprecated with ``Py_DEPRECATED()``.
So, how should things like this be deprecated? And for how long?
These changes don't go through any deprecation process. When you build a C extension affected by these incompatible changes, you get a compilation error.
The important part is coordination and finding a balance between CPython evolutions and backward compatibility. For example, breaking a random, old, obscure and unmaintained C extension on PyPI is less severe than breaking numpy.
How will this balance be found?
By discussing the change.
If a change is reverted, we move back to the coordination step to better prepare the change. Once more C extensions are ready, the incompatible change can be reconsidered.
When in the process should it be reverted?
Whenever you want. If there are too many complaints about an incompatible change during the beta phase and we decide to revert the change, the important part is to revert before the 3.x.0 final release. If the complaints only come after the final release, it's usually too late and it's better to learn how to live with this change. Do you think that these aspects should be clearly specified in the PEP? Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Tue, 23 Jun 2020 16:34:28 +0200 Victor Stinner <vstinner@python.org> wrote:
Le mar. 23 juin 2020 à 15:56, Petr Viktorin <encukou@gmail.com> a écrit :
Adding or removing members of C structures is causing multiple backward compatibility issues.
Adding a new member breaks the stable ABI (PEP 384), especially for types declared statically (e.g. ``static PyTypeObject MyType = {...};``).
PyTypeObject is explicitly not part of the stable ABI, see PEP 384: https://www.python.org/dev/peps/pep-0384/#structures I don't know why Py_TPFLAGS_HAVE_FINALIZE was added, but it wasn't for the PEP 384 stable ABI.
Maybe Antoine Pitrou knows the rationale why Py_TPFLAGS_HAVE_FINALIZE flag was added. Removing the flag was discussed at:
Mainly because some packagers (how many I don't know) had the habit of building a single extension DLL and distributing it for different Python versions on Windows. It was much easier to add the flag than to pollute the PEP 442 discussion with a sub-discussion about the robustness of such ABI stability claims (which officially didn't exist, but were still relied on by some users / packagers). Regards Antoine.
On 2020-06-23, Thomas Wouters wrote:
I think the ability for per-type allocation/deallocation routines isn't really about efficiency, but more about giving more control to embedding systems (or libraries wrapped by extension modules) about how *their* objects are allocated. It doesn't make much sense, however, because Python wouldn't allocate their objects anyway, just the Python objects wrapping theirs. Allocating CPython objects should be CPython's job.
My thinking is that, eventually, we would like to allow CPython to use something other than reference counting for internal PyObject memory management. In other systems with garbage collection, the memory allocator is typically tightly integrated with the garbage collector. To get good efficiency, they need to cooperate. E.g. newly allocated objects are allocated in nursery memory arenas. The current API doesn't allow that because you can allocate memory via some custom allocator and then pass that memory to be initialized and treated as a PyObject. That's one thing locking us into reference counting. This relates to the sub-interpreter discussion. I think the sub-interpreter cleanup work is worth doing, if only because it will make embedding CPython cleaner. I have some doubts that sub-interpreters will help much in terms of multi-core utilization. Efficiently sharing data between interpreters seems like a huge challenge. I think we should also pursue Java style multi-threading and complete the "gilectomy". To me, that means killing reference counting for internal PyObject management.
Le mar. 23 juin 2020 à 20:37, Neil Schemenauer <nas-python@arctrix.com> a écrit :
My thinking is that, eventually, we would like to allow CPython to use something other than reference counting for internal PyObject memory management. In other systems with garbage collection, the memory allocator is typically tightly integrated with the garbage collector. To get good efficiency, they need to cooperate. E.g. newly allocated objects are allocated in nursery memory arenas.
The PEP 620 only mentions replacing CPython GC with a tracing GC at one place: "It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations. For example, tagged pointers, and replace the garbage collector with a tracing garbage collector which can move objects." I chose to not make it a requirement of the PEP 620 on purpose. IMHO such a change will very likely require a whole PEP by itself, since there are likely many small things which will have to be changed to prepare such a major change. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Le mar. 23 juin 2020 à 16:56, Stefan Behnel <stefan_ml@behnel.de> a écrit :
Adding a new member breaks the stable ABI (PEP 384), especially for types declared statically (e.g. ``static PyTypeObject MyType = {...};``). In Python 3.4, the PEP 442 "Safe object finalization" added the ``tp_finalize`` member at the end of the ``PyTypeObject`` structure. For ABI backward compatibility, a new ``Py_TPFLAGS_HAVE_FINALIZE`` type flag was required to announce if the type structure contains the ``tp_finalize`` member. The flag was removed in Python 3.8 (`bpo-32388 <https://bugs.python.org/issue32388>`_).
Probably not the best example. I think this is pretty much normal API evolution. Changing the deallocation protocol for objects is going to impact any public API in one way or another. PyTypeObject is also not exposed with its struct fields in the limited API, so your point regarding "tp_print" is also not a strong one.
The PEP 442 doesn't break backward compatibility. C extensions using tp_dealloc continue to work. But adding a new member to PyTypeObject caused practical implementation issues. I'm not sure why you are mentioning the limited C API. Most C extensions don't use it and declare their type as "static types". I'm not trying to describe the Py_TPFLAGS_HAVE_FINALIZE story as a major blocker issue in the CPython history. It's just one of the many examples of issues to evolve CPython internals.
Same CPython design since 1990: structures and reference counting ----------------------------------------------------------------- Members of ``PyObject`` and ``PyTupleObject`` structures have not changed since the "Initial revision" commit (1990)
While I see an advantage in hiding the details of PyObject (specifically memory management internals), I would argue that there simply isn't much to improve in PyTupleObject, so these two don't fly at the same level for me.
There are different reasons to make PyTupleObject opaque: * Prevent access to members of its "PyObject ob_base" member (disallow accessing directly "tuple->ob_base.ob_refcnt") * Prevent C extensions to make assumptions on how a Python implementation stores a tuple. Currently, C extensions are designed to have best performances with CPython, but it makes them run slower on PyPy. * It becomes possible to experiment with a more efficient PyTypeObject layout, in terms of memory footprint or runtime performance, depending on the use case. For example, storing directly numbers as numbers rather than PyObject. Or maybe use a different layout to make PyList_AsTuple() an O(1) operation. I had a similar idea about converting a bytearray into a bytes without having to copy memory. It also requires to modify PyBytesObject to experiment such idea. An array of PyObject* is the most efficient storage for all use cases.
My feeling is that PyPy specifically is better served with the HPy API, which is different enough to consider it a mostly separate API, or an evolution of the limited API, if you want. Suggesting that extension authors support two different APIs is much, but forcing them to support the existing CPython C-API (for legacy reasons) and the changed CPython C-API (for future compatibility), and then asking them to support a separate C-API in addition (for platform independence, with performance penalties) seems stretching it a lot.
The PEP 620 changes the C API to make it converge to the limited C API, but it also prepares C extensions to ease their migration to HPy. For example, by design, HPy doesn't give a direct access to the PyTupleObject.ob_item member. Enforce usage of PyTuple_GetItem() function or PyTuple_GET_ITEM() maro should ease migration to HPy_GetItem_i(). I disagree that extension authors have to support two C APIs. Many PEP 620 incompatible C changes are already completed, and I was surprised by the very low numbers of extensions affected by these changes. In practice, most extensions use simple and regular C code, they don't "abuse" the C API. Cython itself is affected by most chances since Cython basically uses all C API features :-) But in practice, only a minority of extensions written with Cython are affected, since they (indirectly, via Cython) only use a subset of the C API. Also, once an extension is updated for incompatible changes, it remains compatible with old Python versions. When a new function is used, pythoncapi_compat.h can be used to support old Python versions. It is not like code has to be duplicated to support two unrelated APIs.
* (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros become functions which cannot be used as l-value. * (**Completed**) New C API functions must not return borrowed references. * (**In Progress**) Provide ``pythoncapi_compat.h`` header file. * (**In Progress**) Make structures opaque, add getter and setter functions. * (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``. * (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros to static inline functions.
Most of these have the potential to break code, sometimes needlessly, AFAICT.
Py_SET_xxx() functions are designed to allow experiment tagged pointers in CPython. Do you mean that tagged pointers are not worth to be experimented with? Early Neil's proof-of-concept was promising: https://mail.python.org/archives/list/capi-sig@python.org/thread/EGAY55ZWMF2... PyPy decided to abandon tagged pointers, since it wasn't really worth it in PyPy. But PyPy and CPython have a very different design, IMO the performance will be more interesting in CPython than in PyPy.
Especially the efforts to block away the internal data structures annoy me. It's obviously ok if we don't require other implementations to provide this access, but CPython has these data structures and I think it should continue to expose them.
CPython continues to expose structures in its internal C API.
If we remove CPython specific features from the (de-facto) "official public Python C-API", then I think there should be a "public CPython 3.X C-API" that actively exposes the data structures natively, not just an "internal" one. That way, extension authors can take the usual decision between performance, maintenance effort and platform independence.
I would like to promote "portable" C code, rather than promote writing CPython specific code. I mean that the "default" should be the portable API, and writing CPython specific code would be a deliberate opt-in choice.
typedef struct { PyObject ob_base; double ob_fval; } PyFloatObject;
Please keep PyFloat_AS_DOUBLE() and friends do what they currently do.
If PyFloatObject becomes opaque, PyFloat_AS_DOUBLE() macro must become a function call.
Making ``PyTypeObject`` structure opaque breaks C extensions declaring types statically (e.g. ``static PyTypeObject MyType = {...};``).
Not necessarily. There was an unimplemented feature proposed in PEP-3121, the PyType_Copy() function.
https://www.python.org/dev/peps/pep-3121/#specification
PyTypeObject does not have to be opaque. But it also doesn't have to be the same thing for defining and for using types. You could still define a type with a PyTypeObject struct and then copy it over into a heap type or other internal type structure from there.
A practical issue is that many C extensions refer directly to a type using something like "&MyType". Example in CPython: #define PyUnicode_CheckExact(op) Py_IS_TYPE(op, &PyUnicode_Type) If PyType_Copy(&PyUnicode_Type) is used to allocate the real unicode type as a heap type, code using &PyUnicode_Type will fail. See https://bugs.python.org/issue40601 "[C API] Hide static types from the limited C API" about this issue. This issue concerns subinterpreters: each subinterpreter should have its own (copied of) types.
Whether that's better than using PyType_FromSpec(), maybe not, but at least it doesn't mean we have to break existing code that uses static extension type definitions.
If we choose the PyType_Copy() way, we must stop referring to types as "PyTypeObject*" internally, but maybe use "PyHeapTypeObject*" or something else. Currently, static types and heap types are interchangeable on purpose.
I haven't come across a use case yet where I had to change a ref-count by more than 1, but allowing users to arbitrarily do that may require way more infrastructure under the hood than allowing them to create or remove a single reference to an object. I think explicit is really better than implicit here.
Py_SET_REFCNT() is not Py_INCREF(). It's used for special functions like free lists, resurrect an object, save/restore reference counter (during resurrection), etc.
The same does not seem to apply to "Py_SET_TYPE()" and "Py_SET_SIZE()", since any object or (applicable) container implementation would normally have to know its type and size, regardless of any implementation details.
Py_SET_TYPE() is needed to set tp_base on types declared statically. "tp_base = &PyType_Type" doesn't work on Visual Studio if I recall correctly. See for example the numpy fix: https://github.com/numpy/numpy/commit/a96b18e3d4d11be31a321999cda4b795ea9ecc... Py_SET_SIZE() is needed for types which inherit from PyVarObject, like PyListObject.
The important part is coordination and finding a balance between CPython evolutions and backward compatibility. For example, breaking a random, old, obscure and unmaintained C extension on PyPI is less severe than breaking numpy.
This sounds like a common CI testing infrastructure would help all sides. Currently, we have something like that mostly working by having different projects integrate with each other's master branch, e.g. Pandas, NumPy, Cython, and notifying each other of detected breakages. It's mostly every project setting up its own CI on travis&Co here, so a bit of duplicated work on all sides. Not sure if that's inherently bad, but there's definitely some room for generalisation and improvements.
I wrote https://github.com/vstinner/pythonci to test cython, numpy and a few other projects on the next Python version (master branch). First, I even wrote a section of the PEP: "please test your project on the next Python version", but I removed it since it doesn't require any change in CPython itself, and we cannot require people to do it.
Again, thanks Victor for pushing these efforts. Even if me and others are giving you a hard time getting your proposals accepted, I appreciate the work that you put into improving the ecosystem(s).
Thanks Stefan for your very useful feedback :-) I'm sure that it will help to enhance the PEP. I'm open to consider removing a bunch of incompatible changes like making the PyObject structure opaque. If you look at my PyObject https://bugs.python.org/issue39573 and PyTypeObject https://bugs.python.org/issue40170 issues: the changes that I already pushed are mostly changes to abstract access to these structures. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Victor Stinner schrieb am 24.06.20 um 02:14:
Le mar. 23 juin 2020 à 16:56, Stefan Behnel a écrit :
Members of ``PyObject`` and ``PyTupleObject`` structures have not changed since the "Initial revision" commit (1990)
While I see an advantage in hiding the details of PyObject (specifically memory management internals), I would argue that there simply isn't much to improve in PyTupleObject, so these two don't fly at the same level for me.
There are different reasons to make PyTupleObject opaque: [Some reeasons why *PyObject* should not be exposed]
* Prevent C extensions to make assumptions on how a Python implementation stores a tuple. Currently, C extensions are designed to have best performances with CPython, but it makes them run slower on PyPy.
* It becomes possible to experiment with a more efficient PyTypeObject layout, in terms of memory footprint or runtime performance, depending on the use case. For example, storing directly numbers as numbers rather than PyObject. Or maybe use a different layout to make PyList_AsTuple() an O(1) operation. I had a similar idea about converting a bytearray into a bytes without having to copy memory. It also requires to modify PyBytesObject to experiment such idea. An array of PyObject* is the most efficient storage for all use cases.
Note, I understand the difference between ABI and API. Keeping PyTuple_GET_ITEM() a macro or inline function can break the ABI at some point once PyTupleObject changes in an incompatible way in Py3.14, and it may do different things in PyPy entirely at some point. That's fine. We have a policy of allowing ABI breakage between CPython minor releases. But this does not mean that PyTupleObject needs to become an opaque type that requires a function call into CPython for PyTuple_GET_ITEM(). It *may* become that at some point, when there is a reason to change it into a function call. In the current implementation, there is no such reason. In a future implementation, there may or may not be a reason. We do not know that. As of now, we're just needlessly slowing down existing code by disallowing the C compiler to see that PyTuple_GET_ITEM() literally just does a single pointer dereference. This applies ten-fold to types like PyLong and PyFloat, where getting straight at the native C value is also just a pointer dereference. Basically, what I'm asking is to keep things as efficient as they are *in CPython* as long as there is no reason to change them *in CPython*.
If we remove CPython specific features from the (de-facto) "official public Python C-API", then I think there should be a "public CPython 3.X C-API" that actively exposes the data structures natively, not just an "internal" one. That way, extension authors can take the usual decision between performance, maintenance effort and platform independence.
I would like to promote "portable" C code, rather than promote writing CPython specific code.
I mean that the "default" should be the portable API, and writing CPython specific code would be a deliberate opt-in choice.
That's what I mean by "public CPython 3.X C-API". Don't discourage its use, don't hide away details. Just make it clear what is CPython specific and what isn't, but without judging. It's a good thing for extensions to be fast on CPython.
I haven't come across a use case yet where I had to change a ref-count by more than 1, but allowing users to arbitrarily do that may require way more infrastructure under the hood than allowing them to create or remove a single reference to an object. I think explicit is really better than implicit here.
Py_SET_REFCNT() is not Py_INCREF(). It's used for special functions like free lists, resurrect an object, save/restore reference counter (during resurrection), etc.
Exactly, so it is Py_INCREF() or Py_DECREF(), just without side-effects. I'm arguing that the use case is also practically the same: increase or decrease the refcount of an object, but without triggering the deallocation machinery. Now read my paragraph above again. :) Is it too late in the Py3.9 cycle to switch to two separate macros? Stefan
Le mer. 24 juin 2020 à 16:20, Stefan Behnel <stefan_ml@behnel.de> a écrit :
Note, I understand the difference between ABI and API. Keeping PyTuple_GET_ITEM() a macro or inline function can break the ABI at some point once PyTupleObject changes in an incompatible way in Py3.14, and it may do different things in PyPy entirely at some point. That's fine. We have a policy of allowing ABI breakage between CPython minor releases.
In the short term, I agree that it's ok that PyFloat_AS_DOUBLE() continues to read directly PyFloatObject.ob_fval. There is no *need* to enforce a function call at the ABI level for now. My practical problem is how to prevent C extensions accessing the PyFloatObject.ob_fval member directly. In my tests, I renamed PyObject members. For example, rename PyObject.ob_type to PyObject._ob_type, and update Py_TYPE() and Py_SET_TYPE(). If a C function accesses directly PyObject.ob_type, a compilation error is issued. One option would be to have a different stricter build mode where PyFloat_AS_DOUBLE() becomes a function call. Example: #ifndef Py_LIMITED_API # ifdef OPAQUE_STRUCTURE # define PyFloat_AS_DOUBLE(op) PyFloat_AsDouble(op) # else # define PyFloat_AS_DOUBLE(op) (((PyFloatObject *)(op))->ob_fval) # endif #endif The function is not available in the Py_LIMITED_API, so other Python implementations don't have to implement it. But an OPAQUE_STRUCTRE macro would declare the macro as a function call: alias to PyFloat_AsDouble(). Or maybe it's time to extend the limited C API: add PyFloat_AS_DOUBLE() macro as a function call. Extending the limited C API has multiple advantages: * It eases the transition of C extensions to the limited C API * Py_LIMITED_API already exists, there is no need to add yet another build mode or any new macro * Most structures are *already* opaque in the limited C API. The question becomes: how to promote the limited C API? Should it become the default, rather than an opt-in option?
That's what I mean by "public CPython 3.X C-API". Don't discourage its use, don't hide away details. Just make it clear what is CPython specific and what isn't, but without judging. It's a good thing for extensions to be fast on CPython.
I modified "make install" in Python 3.8 to install the internal C API to make it possible to use it outside CPython. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Wed, 24 Jun 2020 at 17:22, Victor Stinner <vstinner@python.org> wrote: [...]
The question becomes: how to promote the limited C API? Should it become the default, rather than an opt-in option?
It would be interesting to find out what is the performance impact of using limited C API, vs normal API, on some popular extensions. This is something that I wish had been included in PEP 384. It would be great if the limited API could be the default, as it allows building extensions once that work across most python versions. -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert
Gustavo Carneiro schrieb am 24.06.20 um 19:19:
On Wed, 24 Jun 2020 at 17:22, Victor Stinner wrote:
The question becomes: how to promote the limited C API? Should it become the default, rather than an opt-in option?
It would be interesting to find out what is the performance impact of using limited C API, vs normal API, on some popular extensions. This is something that I wish had been included in PEP 384.
It couldn't because even today it is still fairly difficult to convert existing code to the limited API. Some code cannot even be migrated at all, e.g. because the entire buffer protocol is missing from it. Some bugs were only fixed in Py3.9, time will tell if anything else is missing. The only major project that I know has been migrated (recently, with a lot of effort) is the PyQt project. And a GUI toolkit probably doesn't have all that many performance critical parts that are dominated by the CPython C-API. (I'm just guessing, it probably has some, somewhere).
It would be great if the limited API could be the default, as it allows building extensions once that work across most python versions.
We are adding a C compile mode for the limited API to Cython. That's also a lot of effort, and probably won't be finished soon, but once that becomes any usable, we'd have a whole bunch of real-world extensions that we could use for benchmarking, many of which were written for speed. We could even take a regular Python module and compile it in both variants to compare "pure Python" to "full C-API" to "limited C-API". Stefan
Victor Stinner schrieb am 24.06.20 um 17:40:
My practical problem is how to prevent C extensions accessing the PyFloatObject.ob_fval member directly.
Do extensions really do that in their code? I mean, there *is* a macro for doing exactly this thing, which suggests that users should exactly *not* do it themselves but use the macro. I would simply say that anyone accessing the structure fields directly instead of using the intended macro is simply on their own with that choice. If their code breaks, they'll have to fix it in the way that was intended for the last 23 years (I looked that up). I don't have any data, but to me, this sounds like a non-issue to start with.
In my tests, I renamed PyObject members. For example, rename PyObject.ob_type to PyObject._ob_type, and update Py_TYPE() and Py_SET_TYPE(). If a C function accesses directly PyObject.ob_type, a compilation error is issued.
I think the path of - making macros / (inline) functions available for all use cases - making them available in a backport header file - telling people to use those instead of direct struct access is the right way. If/when we notice in the future that we need to change an object struct, and macros are available for the use cases that we break (or can be made available during a suitable deprecation phase), then extension authors will notice at that point that they will have to switch to the macros instead of doing whatever breaks for them (or not).
One option would be to have a different stricter build mode where PyFloat_AS_DOUBLE() becomes a function call. Example:
#ifndef Py_LIMITED_API # ifdef OPAQUE_STRUCTURE # define PyFloat_AS_DOUBLE(op) PyFloat_AsDouble(op) # else # define PyFloat_AS_DOUBLE(op) (((PyFloatObject *)(op))->ob_fval) # endif #endif
I think that's too broad. Why make all structs opaque, when we don't even know which ones we may want to touch in the future at all? And, who would really use this mode?
Or maybe it's time to extend the limited C API: add PyFloat_AS_DOUBLE() macro as a function call. Extending the limited C API has multiple advantages:
* It eases the transition of C extensions to the limited C API * Py_LIMITED_API already exists, there is no need to add yet another build mode or any new macro * Most structures are *already* opaque in the limited C API.
We will have to grow it anyway, so why not. We could also add yet another optional header file that adds everything from the full C-API that we can somehow map to the limited C-API, as macros or inline functions. In the worst case, we could still implement a missing function as a lookup and call through a Python object method, if there's no other way to do it in the limited C-API. In the end, this could lead to a "full C-API wrapper", implemented on top of the limited C-API. Sounds like a good way to port existing code.
The question becomes: how to promote the limited C API? Should it become the default, rather than an opt-in option?
With the above "full wrapper", it could become the default. That would give authors three choices: - the full C-API (being tied to a minor release) - the limited C-API (limited but providing forward compatibility) - the wrapper (being slower but providing forward+backward compatibility) Stefan
Le mer. 24 juin 2020 à 21:16, Stefan Behnel <stefan_ml@behnel.de> a écrit :
It couldn't because even today it is still fairly difficult to convert existing code to the limited API. Some code cannot even be migrated at all, e.g. because the entire buffer protocol is missing from it. Some bugs were only fixed in Py3.9, time will tell if anything else is missing.
The only major project that I know has been migrated (recently, with a lot of effort) is the PyQt project. And a GUI toolkit probably doesn't have all that many performance critical parts that are dominated by the CPython C-API. (I'm just guessing, it probably has some, somewhere).
I created https://bugs.python.org/issue41111 to propose to convert a bunch of stdlib extensions to the limited C API to eat our own dog food: to ensure that the limited C API is usable, or not just a "technology demonstration". I listed missing features of the limited C API, like getbuffer and releasebuffer.
We are adding a C compile mode for the limited API to Cython.
That's really great! Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Thank you very much for putting this PEP together. It would be very helpful to broaden the objective of avoiding functions returning PyObject** to other types of pointers. I have in mind several functions in the C-API that return a char* pointer to the contents of an object. While these functions are easy to implement on top of the CPython object model they are challenging for alternative Python implementations. Consider PyBytes_AsString: it returns a mutable char* pointing to the contents of a byte instance. This presents several obvious problems. For starters, it burdens a relocating garbage collector to pin objects or create a temporary copy of an object's contents in non-moving memory. It also has implications for treating PyObejct* as a handle, using tagged pointers (and tagged immediates), and multi-threading. To eliminate C-API functions such as PyBytes_AsString, PyUnicode_AsUTF8, etc., new functions should be added to the C-API that copy the contents of objects out into a buffer, similar to PyUnicode_AsUCS4 or to return the contents in an dynamically allocated buffer like PyUnicode_AsUCS4Copy. On Mon, Jun 22, 2020 at 5:12 AM Victor Stinner <vstinner@python.org> wrote: > Hi, > > PEP available at: https://www.python.org/dev/peps/pep-0620/ > > <introduction> > This PEP is the result of 4 years of research work on the C API: > https://pythoncapi.readthedocs.io/ > > It's the third version. The first version (2017) proposed to add a > "new C API" and advised C extensions maintainers to opt-in for it: it > was basically the same idea as PEP 384 limited C API but in a > different color. Well, I had no idea of what I was doing :-) The > second version (April 2020) proposed to add a new Python runtime built > from the same code base as the regular Python runtime but in a > different build mode, the regular Python would continue to be fully > compatible. > > I wrote the third version, the PEP 620, from scratch. It now gives an > explicit and concrete list of incompatible C API changes, and has > better motivation and rationale sections. The main PEP novelty is the > new pythoncapi_compat.h header file distributed with Python to provide > new C API functions to old Python versions, the second novelty is the > process to reduce the number of broken C extensions. > > Whereas PEPs are usually implemented in a single Python version, the > implementation of this PEP is expected to be done carefully over > multiple Python versions. The PEP lists many changes which are already > implemented in Python 3.7, 3.8 and 3.9. It defines a process to reduce > the number of broken C extensions when introducing the incompatible C > API changes listed in the PEP. The process dictates the rhythm of > these changes. > </introduction> > > > PEP: 620 > Title: Hide implementation details from the C API > Author: Victor Stinner <vstinner@python.org> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 19-June-2020 > Python-Version: 3.10 > > Abstract > ======== > > Introduce C API incompatible changes to hide implementation details. > > Once most implementation details will be hidden, evolution of CPython > internals would be less limited by C API backward compatibility issues. > It will be way easier to add new features. > > It becomes possible to experiment with more advanced optimizations in > CPython > than just micro-optimizations, like tagged pointers. > > Define a process to reduce the number of broken C extensions. > > The implementation of this PEP is expected to be done carefully over > multiple Python versions. It already started in Python 3.7 and most > changes are already completed. The `Process to reduce the number of > broken C extensions`_ dictates the rhythm. > > > Motivation > ========== > > The C API blocks CPython evolutions > ----------------------------------- > > Adding or removing members of C structures is causing multiple backward > compatibility issues. > > Adding a new member breaks the stable ABI (PEP 384), especially for > types declared statically (e.g. ``static PyTypeObject MyType = > {...};``). In Python 3.4, the PEP 442 "Safe object finalization" added > the ``tp_finalize`` member at the end of the ``PyTypeObject`` structure. > For ABI backward compatibility, a new ``Py_TPFLAGS_HAVE_FINALIZE`` type > flag was required to announce if the type structure contains the > ``tp_finalize`` member. The flag was removed in Python 3.8 (`bpo-32388 > <https://bugs.python.org/issue32388>`_). > > The ``PyTypeObject.tp_print`` member, deprecated since Python 3.0 > released in 2009, has been removed in the Python 3.8 development cycle. > But the change broke too many C extensions and had to be reverted before > 3.8 final release. Finally, the member was removed again in Python 3.9. > > C extensions rely on the ability to access directly structure members, > indirectly through the C API, or even directly. Modifying structures > like ``PyListObject`` cannot be even considered. > > The ``PyTypeObject`` structure is the one which evolved the most, simply > because there was no other way to evolve CPython than modifying it. > > In the C API, all Python objects are passed as ``PyObject*``: a pointer > to a ``PyObject`` structure. Experimenting tagged pointers in CPython is > blocked by the fact that a C extension can technically dereference a > ``PyObject*`` pointer and access ``PyObject`` members. Small "objects" > can be stored as a tagged pointer with no concrete ``PyObject`` > structure. > > Replacing Python garbage collector with a tracing garbage collector > would also need to remove ``PyObject.ob_refcnt`` reference counter, > whereas currently ``Py_INCREF()`` and ``Py_DECREF()`` macros access > directly to ``PyObject.ob_refcnt``. > > Same CPython design since 1990: structures and reference counting > ----------------------------------------------------------------- > > When the CPython project was created, it was written with one principle: > keep the implementation simple enough so it can be maintained by a > single developer. CPython complexity grew a lot and many > micro-optimizations have been implemented, but CPython core design has > not changed. > > Members of ``PyObject`` and ``PyTupleObject`` structures have not > changed since the "Initial revision" commit (1990):: > > #define OB_HEAD \ > unsigned int ob_refcnt; \ > struct _typeobject *ob_type; > > typedef struct _object { > OB_HEAD > } object; > > typedef struct { > OB_VARHEAD > object *ob_item[1]; > } tupleobject; > > Only names changed: ``object`` was renamed to ``PyObject`` and > ``tupleobject`` was renamed to ``PyTupleObject``. > > CPython still tracks Python objects lifetime using reference counting > internally and for third party C extensions (through the Python C API). > > All Python objects must be allocated on the heap and cannot be moved. > > Why is PyPy more efficient than CPython? > ---------------------------------------- > > The PyPy project is a Python implementation which is 4.2x faster than > CPython on average. PyPy developers chose to not fork CPython, but start > from scratch to have more freedom in terms of optimization choices. > > PyPy does not use reference counting, but a tracing garbage collector > which moves objects. Objects can be allocated on the stack (or even not > at all), rather than always having to be allocated on the heap. > > Objects layouts are designed with performance in mind. For example, a > list strategy stores integers directly as integers, rather than objects. > > Moreover, PyPy also has a JIT compiler which emits fast code thanks to > the efficient PyPy design. > > PyPy bottleneck: the Python C API > --------------------------------- > > While PyPy is way more efficient than CPython to run pure Python code, > it is as efficient or slower than CPython to run C extensions. > > Since the C API requires ``PyObject*`` and allows to access directly > structure members, PyPy has to associate a CPython object to PyPy > objects and maintain both consistent. Converting a PyPy object to a > CPython object is inefficient. Moreover, reference counting also has to > be implemented on top of PyPy tracing garbage collector. > > These conversions are required because the Python C API is too close to > the CPython implementation: there is no high-level abstraction. > For example, structures members are part of the public C API and nothing > prevents a C extension to get or set directly > ``PyTupleObject.ob_item[0]`` (the first item of a tuple). > > See `Inside cpyext: Why emulating CPython C API is so Hard > < > https://morepypy.blogspot.com/2018/09/inside-cpyext-why-emulating-cpython-c.html > >`_ > (Sept 2018) by Antonio Cuni for more details. > > > Rationale > ========= > > Hide implementation details > --------------------------- > > Hiding implementation details from the C API has multiple advantages: > > * It becomes possible to experiment with more advanced optimizations in > CPython than just micro-optimizations. For example, tagged pointers, > and replace the garbage collector with a tracing garbage collector > which can move objects. > * Adding new features in CPython becomes easier. > * PyPy should be able to avoid conversions to CPython objects in more > cases: keep efficient PyPy objects. > * It becomes easier to implement the C API for a new Python > implementation. > * More C extensions will be compatible with Python implementations other > than CPython. > > Relationship with the limited C API > ----------------------------------- > > The PEP 384 "Defining a Stable ABI" is in Python 3.4. It introduces the > "limited C API": a subset of the C API. When the limited C API is used, > it becomes possible to build a C extension only once and use it on > multiple Python versions: that's the stable ABI. > > The main limitation of the PEP 384 is that C extensions have to opt-in > for the limited C API. Only very few projects made this choice, > usually to ease distribution of binaries, especially on Windows. > > This PEP moves the C API towards the limited C API. > > Ideally, the C API will become the limited C API and all C extensions > will use the stable ABI, but this is out of this PEP scope. > > > Specification > ============= > > Summary > ------- > > * (**Completed**) Reorganize the C API header files: create > ``Include/cpython/`` and > ``Include/internal/`` subdirectories. > * (**Completed**) Move private functions exposing implementation > details to the internal > C API. > * (**Completed**) Convert macros to static inline functions. > * (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` > and > ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and > ``Py_SIZE()`` macros become functions which cannot be used as l-value. > * (**Completed**) New C API functions must not return borrowed > references. > * (**In Progress**) Provide ``pythoncapi_compat.h`` header file. > * (**In Progress**) Make structures opaque, add getter and setter > functions. > * (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``. > * (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and > ``PyList_GET_ITEM()`` macros to static inline functions. > > Reorganize the C API header files > --------------------------------- > > The first consumer of the C API was Python itself. There is no clear > separation between APIs which must not be used outside Python, and API > which are public on purpose. > > Header files must be reorganized in 3 API: > > * ``Include/`` directory is the limited C API: no implementation > details, structures are opaque. C extensions using it get a stable > ABI. > * ``Include/cpython/`` directory is the CPython C API: less "portable" > API, depends more on the Python version, expose some implementation > details, few incompatible changes can happen. > * ``Internal/internal/`` directory is the internal C API: implementation > details, incompatible changes are likely at each Python release. > > The creation of the ``Include/cpython/`` directory is fully backward > compatible. ``Include/cpython/`` header files cannot be included > directly and are included automatically by ``Include/`` header files > when the ``Py_LIMITED_API`` macro is not defined. > > The internal C API is installed and can be used for specific usage like > debuggers and profilers which must access structures members without > executing code. C extensions using the internal C API are tightly > coupled to a Python version and must be recompiled at each Python > version. > > **STATUS**: Completed (in Python 3.8) > > The reorganization of header files started in Python 3.7 and was > completed in Python 3.8: > > * `bpo-35134 <https://bugs.python.org/issue35134>`_: Add a new > Include/cpython/ subdirectory for the "CPython API" with > implementation details. > * `bpo-35081 <https://bugs.python.org/issue35081>`_: Move internal > headers to ``Include/internal/`` > > Move private functions to the internal C API > -------------------------------------------- > > Private functions which expose implementation details must be moved to > the internal C API. > > If a C extension relies on a CPython private function which exposes > CPython implementation details, other Python implementations have to > re-implement this private function to support this C extension. > > **STATUS**: Completed (in Python 3.9) > > Private functions moved to the internal C API in Python 3.8: > > * ``_PyObject_GC_TRACK()``, ``_PyObject_GC_UNTRACK()`` > > Macros and functions excluded from the limited C API in Python 3.9: > > * ``_PyObject_SIZE()``, ``_PyObject_VAR_SIZE()`` > * ``PyThreadState_DeleteCurrent()`` > * ``PyFPE_START_PROTECT()``, ``PyFPE_END_PROTECT()`` > * ``_Py_NewReference()``, ``_Py_ForgetReference()`` > * ``_PyTraceMalloc_NewReference()`` > * ``_Py_GetRefTotal()`` > > Private functions moved to the internal C API in Python 3.9: > > * GC functions like ``_Py_AS_GC()``, ``_PyObject_GC_IS_TRACKED()`` > and ``_PyGCHead_NEXT()`` > * ``_Py_AddToAllObjects()`` (not exported) > * ``_PyDebug_PrintTotalRefs()``, ``_Py_PrintReferences()``, > ``_Py_PrintReferenceAddresses()`` (not exported) > > Public "clear free list" functions moved to the internal C API an > renamed to private functions and in Python 3.9: > > * ``PyAsyncGen_ClearFreeLists()`` > * ``PyContext_ClearFreeList()`` > * ``PyDict_ClearFreeList()`` > * ``PyFloat_ClearFreeList()`` > * ``PyFrame_ClearFreeList()`` > * ``PyList_ClearFreeList()`` > * ``PyTuple_ClearFreeList()`` > * Functions simply removed: > > * ``PyMethod_ClearFreeList()`` and ``PyCFunction_ClearFreeList()``: > bound method free list removed in Python 3.9. > * ``PySet_ClearFreeList()``: set free list removed in Python 3.4. > * ``PyUnicode_ClearFreeList()``: Unicode free list removed > in Python 3.3. > > Convert macros to static inline functions > ----------------------------------------- > > Converting macros to static inline functions have multiple advantages: > > * Functions have well defined parameter types and return type. > * Functions can use variables with a well defined scope (the function). > * Debugger can be put breakpoints on functions and profilers can display > the function name in the call stacks. In most cases, it works even > when a static inline function is inlined. > * Functions don't have `macros pitfalls > <https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html>`_. > > Converting macros to static inline functions should only impact very few > C extensions which use macros in unusual ways. > > For backward compatibility, functions must continue to accept any type, > not only ``PyObject*``, to avoid compiler warnings, since most macros > cast their parameters to ``PyObject*``. > > Python 3.6 requires C compilers to support static inline functions: the > PEP 7 requires a subset of C99. > > **STATUS**: Completed (in Python 3.9) > > Macros converted to static inline functions in Python 3.8: > > * ``Py_INCREF()``, ``Py_DECREF()`` > * ``Py_XINCREF()``, ``Py_XDECREF()`` > * ``PyObject_INIT()``, ``PyObject_INIT_VAR()`` > * ``_PyObject_GC_TRACK()``, ``_PyObject_GC_UNTRACK()``, ``_Py_Dealloc()`` > > Macros converted to regular functions in Python 3.9: > > * ``Py_EnterRecursiveCall()``, ``Py_LeaveRecursiveCall()`` > (added to the limited C API) > * ``PyObject_INIT()``, ``PyObject_INIT_VAR()`` > * ``PyObject_GET_WEAKREFS_LISTPTR()`` > * ``PyObject_CheckBuffer()`` > * ``PyIndex_Check()`` > * ``PyObject_IS_GC()`` > * ``PyObject_NEW()`` (alias to ``PyObject_New()``), > ``PyObject_NEW_VAR()`` (alias to ``PyObject_NewVar()``) > * ``PyType_HasFeature()`` (always call ``PyType_GetFlags()``) > * ``Py_TRASHCAN_BEGIN_CONDITION()`` and ``Py_TRASHCAN_END()`` macros > now call functions which hide implementation details, rather than > accessing directly members of the ``PyThreadState`` structure. > > Make structures opaque > ---------------------- > > All structures of the C API should become opaque: C extensions must > use getter or setter functions to get or set structure members. For > example, ``tuple->ob_item[0]`` must be replaced with > ``PyTuple_GET_ITEM(tuple, 0)``. > > To be able to move away from reference counting, ``PyObject`` must > become opaque. Currently, the reference counter ``PyObject.ob_refcnt`` > is exposed in the C API. All structures must become opaque, since they > "inherit" from PyObject. For, ``PyFloatObject`` inherits from > ``PyObject``:: > > typedef struct { > PyObject ob_base; > double ob_fval; > } PyFloatObject; > > Making ``PyObject`` fully opaque requires converting ``Py_INCREF()`` and > ``Py_DECREF()`` macros to function calls. This change has an impact on > performance. It is likely to be one of the very last changes when making > structures opaque. > > Making ``PyTypeObject`` structure opaque breaks C extensions declaring > types statically (e.g. ``static PyTypeObject MyType = {...};``). C > extensions must use ``PyType_FromSpec()`` to allocate types on the heap > instead. Using heap types has other advantages like being compatible > with subinterpreters. Combined with PEP 489 "Multi-phase extension > module initialization", it makes a C extension behavior closer to a > Python module, like allowing to create more than one module instance. > > Making ``PyThreadState`` structure opaque requires adding getter and > setter functions for members used by C extensions. > > **STATUS**: In Progress (started in Python 3.8) > > The ``PyInterpreterState`` structure was made opaque in Python 3.8 > (`bpo-35886 <https://bugs.python.org/issue35886>`_) and the > ``PyGC_Head`` structure (`bpo-40241 > <https://bugs.python.org/issue40241>`_) was made opaque in Python 3.9. > > Issues tracking the work to prepare the C API to make following > structures opaque: > > * ``PyObject``: `bpo-39573 <https://bugs.python.org/issue39573>`_ > * ``PyTypeObject``: `bpo-40170 <https://bugs.python.org/issue40170>`_ > * ``PyFrameObject``: `bpo-40421 <https://bugs.python.org/issue40421>`_ > > * Python 3.9 adds ``PyFrame_GetCode()`` and ``PyFrame_GetBack()`` > getter functions, and moves ``PyFrame_GetLineNumber`` to the limited > C API. > > * ``PyThreadState``: `bpo-39947 <https://bugs.python.org/issue39947>`_ > > * Python 3.9 adds 3 getter functions: ``PyThreadState_GetFrame()``, > ``PyThreadState_GetID()``, ``PyThreadState_GetInterpreter()``. > > Disallow using Py_TYPE() as l-value > ----------------------------------- > > The ``Py_TYPE()`` function gets an object type, its ``PyObject.ob_type`` > member. It is implemented as a macro which can be used as an l-value to > set the type: ``Py_TYPE(obj) = new_type``. This code relies on the > assumption that ``PyObject.ob_type`` can be modified directly. It > prevents making the ``PyObject`` structure opaque. > > New setter functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and > ``Py_SET_SIZE()`` are added and must be used instead. > > The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros must be > converted to static inline functions which can not be used as l-value. > > For example, the ``Py_TYPE()`` macro:: > > #define Py_TYPE(ob) (((PyObject*)(ob))->ob_type) > > becomes:: > > #define _PyObject_CAST_CONST(op) ((const PyObject*)(op)) > > static inline PyTypeObject* _Py_TYPE(const PyObject *ob) { > return ob->ob_type; > } > > #define Py_TYPE(ob) _Py_TYPE(_PyObject_CAST_CONST(ob)) > > **STATUS**: Completed (in Python 3.10) > > New functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and > ``Py_SET_SIZE()`` were added to Python 3.9. > > In Python 3.10, ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` can no > longer be used as l-value and the new setter functions must be used > instead. > > New C API functions must not return borrowed references > ------------------------------------------------------- > > When a function returns a borrowed reference, Python cannot track when > the caller stops using this reference. > > For example, if the Python ``list`` type is specialized for small > integers, store directly "raw" numbers rather than Python objects, > ``PyList_GetItem()`` has to create a temporary Python object. The > problem is to decide when it is safe to delete the temporary object. > > The general guidelines is to avoid returning borrowed references for new > C API functions. > > No function returning borrowed functions is scheduled for removal by > this PEP. > > **STATUS**: Completed (in Python 3.9) > > In Python 3.9, new C API functions returning Python objects only return > strong references: > > * ``PyFrame_GetBack()`` > * ``PyFrame_GetCode()`` > * ``PyObject_CallNoArgs()`` > * ``PyObject_CallOneArg()`` > * ``PyThreadState_GetFrame()`` > > Avoid functions returning PyObject** > ------------------------------------ > > The ``PySequence_Fast_ITEMS()`` function gives a direct access to an > array of ``PyObject*`` objects. The function is deprecated in favor of > ``PyTuple_GetItem()`` and ``PyList_GetItem()``. > > ``PyTuple_GET_ITEM()`` can be abused to access directly the > ``PyTupleObject.ob_item`` member:: > > PyObject **items = &PyTuple_GET_ITEM(0); > > The ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros are > converted to static inline functions to disallow that. > > **STATUS**: Not Started > > New pythoncapi_compat.h header file > ----------------------------------- > > Making structures opaque requires modifying C extensions to > use getter and setter functions. The practical issue is how to keep > support for old Python versions which don't have these functions. > > For example, in Python 3.10, it is no longer possible to use > ``Py_TYPE()`` as an l-value. The new ``Py_SET_TYPE()`` function must be > used instead:: > > #if PY_VERSION_HEX >= 0x030900A4 > Py_SET_TYPE(&MyType, &PyType_Type); > #else > Py_TYPE(&MyType) = &PyType_Type; > #endif > > This code may ring a bell to developers who ported their Python code > base from Python 2 to Python 3. > > Python will distribute a new ``pythoncapi_compat.h`` header file which > provides new C API functions to old Python versions. Example:: > > #if PY_VERSION_HEX < 0x030900A4 > static inline void > _Py_SET_TYPE(PyObject *ob, PyTypeObject *type) > { > ob->ob_type = type; > } > #define Py_SET_TYPE(ob, type) _Py_SET_TYPE((PyObject*)(ob), type) > #endif // PY_VERSION_HEX < 0x030900A4 > > Using this header file, ``Py_SET_TYPE()`` can be used on old Python > versions as well. > > Developers can copy this file in their project, or even to only > copy/paste the few functions needed by their C extension. > > **STATUS**: In Progress (implemented but not distributed by CPython yet) > > The ``pythoncapi_compat.h`` header file is currently developer at: > https://github.com/pythoncapi/pythoncapi_compat > > Process to reduce the number of broken C extensions > =================================================== > > Process to reduce the number of broken C extensions when introducing C > API incompatible changes listed in this PEP: > > * Estimate how many popular C extensions are affected by the > incompatible change. > * Coordinate with maintainers of broken C extensions to prepare their > code for the future incompatible change. > * Introduce the incompatible changes in Python. The documentation must > explain how to port existing code. It is recommended to merge such > changes at the beginning of a development cycle to have more time for > tests. > * Changes which are the most likely to break a large number of C > extensions should be announced on the capi-sig mailing list to notify > C extensions maintainers to prepare their project for the next Python. > * If the change breaks too many projects, reverting the change should be > discussed, taking in account the number of broken packages, their > importance in the Python community, and the importance of the change. > > The coordination usually means reporting issues to the projects, or even > proposing changes. It does not require waiting for a new release including > fixes for every broken project. > > Since more and more C extensions are written using Cython, rather > directly using the C API, it is important to ensure that Cython is > prepared in advance for incompatible changes. It gives more time for C > extension maintainers to release a new version with code generated with > the updated Cython (for C extensions distributing the code generated by > Cython). > > Future incompatible changes can be announced by deprecating a function > in the documentation and by annotating the function with > ``Py_DEPRECATED()``. But making a structure opaque and preventing the > usage of a macro as l-value cannot be deprecated with > ``Py_DEPRECATED()``. > > The important part is coordination and finding a balance between CPython > evolutions and backward compatibility. For example, breaking a random, > old, obscure and unmaintained C extension on PyPI is less severe than > breaking numpy. > > If a change is reverted, we move back to the coordination step to better > prepare the change. Once more C extensions are ready, the incompatible > change can be reconsidered. > > > Version History > =============== > > * Version 3, June 2020: PEP rewritten from scratch. Python now > distributes a new ``pythoncapi_compat.h`` header and a process is > defined to reduce the number of broken C extensions when introducing C > API incompatible changes listed in this PEP. > * Version 2, April 2020: > `PEP: Modify the C API to hide implementation details > < > https://mail.python.org/archives/list/python-dev@python.org/thread/HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF/#TKHNENOXP6H34E73XGFOL2KKXSM4Z6T2 > >`_. > * Version 1, July 2017: > `PEP: Hide implementation details in the C API > < > https://mail.python.org/archives/list/python-ideas@python.org/thread/6XATDGWK4VBUQPRHCRLKQECTJIPBVNJQ/#HFBGCWVLSM47JEP6SO67MRFT7Y3EOC44 > >`_ > sent to python-ideas > > > Copyright > ========= > > This document has been placed in the public domain. > > -- > Night gathers, and now my watch begins. It shall not end until my death. > _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-leave@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/EV7F7Z6PLPWJU7SD2UPFEYKYUWU4ZJXZ/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
,Hi Carl, Le ven. 26 juin 2020 à 07:36, Carl Shapiro <carl.shapiro@gmail.com> a écrit :
It would be very helpful to broaden the objective of avoiding functions returning PyObject** to other types of pointers. I have in mind several functions in the C-API that return a char* pointer to the contents of an object. While these functions are easy to implement on top of the CPython object model they are challenging for alternative Python implementations.
Consider PyBytes_AsString: it returns a mutable char* pointing to the contents of a byte instance. This presents several obvious problems. For starters, it burdens a relocating garbage collector to pin objects or create a temporary copy of an object's contents in non-moving memory. It also has implications for treating PyObejct* as a handle, using tagged pointers (and tagged immediates), and multi-threading.
To eliminate C-API functions such as PyBytes_AsString, PyUnicode_AsUTF8, etc., new functions should be added to the C-API that copy the contents of objects out into a buffer, similar to PyUnicode_AsUCS4 or to return the contents in an dynamically allocated buffer like PyUnicode_AsUCS4Copy.
Well, the general problem is to track when the caller ends using a resource. Borrowed references are a variant of this problem, PySequence_Fast_ITEMS() is another variant. For PyUnicode_AsUTF8, INADA-san added PyUnicode_GetUTF8Buffer() which should be used wit PyBuffer_Release(): * https://github.com/python/cpython/commit/c7ad974d341d3edb6b9d2a2dcae4d3d4794... * https://github.com/python/cpython/pull/17659 * https://discuss.python.org/t/better-api-for-encoding-unicode-objects-with-ut... ... but it was reverted soon after its addition: * https://github.com/python/cpython/commit/3a8c56295d6272ad2177d2de8af4c3f824f... * https://github.com/python/cpython/pull/18985 * https://bugs.python.org/issue39087 See also the "(PEP 620) C API for efficient loop iterating on a sequence of PyObject** or other C types" thread. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Victor Stinner schrieb am 26.06.20 um 14:39:
Well, the general problem is to track when the caller ends using a resource.
Although that is less of a problem if you only allow exposing the internal data representation and nothing else. In that case, you can tie the lifetime of the data access to the lifetime of the object. Minus moving GCs, as Carl also pointed out. But even there, you could get away (probably for quite a while) with pinning the data if someone asked for it. Stefan
Hi Stefan, I'm interested in experimenting with a moving GC in CPython, but also by modifying the C API to make sure that it is efficient on PyPy or another Python implementation which uses a moving GC. As Carl in the other thread, currently, other Python implementations have to emulate PyObject** which is inefficient. Right now, the C API is an exact mapping of CPython internals which prevents us to enhance or optimize CPython, but also makes other Python implementations inefficient when running C extensions. Victor Le ven. 26 juin 2020 à 23:37, Stefan Behnel <stefan_ml@behnel.de> a écrit :
Victor Stinner schrieb am 26.06.20 um 14:39:
Well, the general problem is to track when the caller ends using a resource.
Although that is less of a problem if you only allow exposing the internal data representation and nothing else. In that case, you can tie the lifetime of the data access to the lifetime of the object.
Minus moving GCs, as Carl also pointed out. But even there, you could get away (probably for quite a while) with pinning the data if someone asked for it.
Stefan _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NS5MTQFC... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
On Jun 22, 2020, at 5:10 AM, Victor Stinner <vstinner@python.org> wrote:
Introduce C API incompatible changes to hide implementation details.
How much of the existing C extension ecosystem do you expect to break as a result of these incompatible changes?
It will be way easier to add new features.
This isn't self-evident. What is currently difficult that would be easier?
It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations, like tagged pointers.
Is there any proof-of-concept to suggest that it is in realm of possibility that such an experiment would produce a favorable outcome? Otherwise, it isn't a reasonable justification for an extensive and irrevocable series a sweeping changes that affect the entire ecosystem of existing extensions.
**STATUS**: Completed (in Python 3.9)
I'm not sure that many people are monitoring that huge number of changes that have gone in mostly unreviewed. Mark Shannon and Stephan Krah have both raised concerns. It seems like one person has been given blanket authorization to revise nearly every aspect of the internals and to undo the design choices made by all the developers who've previously worked on the project.
Converting macros to static inline functions should only impact very few C extensions which use macros in unusual ways.
These should be individually verified to make sure they actually get inlined by the compiler. In https://bugs.python.org/issue39542 about nine PRs were applied without review or discussion. One of those, https://github.com/python/cpython/pull/18364 , converted PyType_Check() to static inline function but I'm not sure that it actually does get inlined. That may be the reason named tuple attribute access slowed by about 25% between Python 3.8 and Python 3.9.¹ Presumably, that PR also affected every single type check in the entire C codebase and will affect third-party extensions as well. FWIW, I do appreciate the devotion and amount of effort in this undertaking — that isn't a question. However, as a community this needs to be conscious decision. I'm unclear about whether any benefits will ever materialize. I am clear that packages will be broken, that performance will be impacted, and that this is one-way trip that can never be undone. Most of the work is being done by one person. Many of the PRs aren't reviewed. The rate and volume of PRs are so high that almost no one can keep track of what is happening. Mark and Stefan have pushed back but with no effect. Raymond ================================================== ¹ Timings for attribute access $ python3.8 -m timeit -s 'from collections import namedtuple' -s 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; p.x; p.y' 2000000 loops, best of 5: 119 nsec per loop $ python3.9 -m timeit -s 'from collections import namedtuple' -s 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; p.x; p.y' 2000000 loops, best of 5: 152 nsec per loop ================================================== Python 3.8 disassembly (clean and fast) ----------------------- _tuplegetter_descr_get: testq %rsi, %rsi je L299 subq $8, %rsp movq 8(%rsi), %rax movq 16(%rdi), %rdx testb $4, 171(%rax) je L300 cmpq 16(%rsi), %rdx jnb L301 movq 24(%rsi,%rdx,8), %rax addq $1, (%rax) L290: addq $8, %rsp ret Python 3.9 disassembly (doesn't look in-lined) ----------------------- _tuplegetter_descr_get: testq %rsi, %rsi pushq %r12 <-- new cost pushq %rbp <-- new cost pushq %rbx <-- new cost movq %rdi, %rbx je L382 movq 16(%rdi), %r12 movq %rsi, %rbp movq 8(%rsi), %rdi call _PyType_GetFlags <-- new non-inlined function call testl $67108864, %eax je L383 cmpq 16(%rbp), %r12 jnb L384 movq 24(%rbp,%r12,8), %rax addq $1, (%rax) popq %rbx <-- new cost popq %rbp <-- new cost popq %r12v <-- new cost ret
You missed the point of the PEP: "It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations, like tagged pointers." IMHO it's time to stop wasting our limited developer resources on micro-optimizations and micro-benchmarks, but think about overall Python performance and major Python internals redesign to find a way to make Python overall 2x faster, rather than making a specific function 10% faster. I don't think that the performance of accessing namedtuple attributes is a known bottleneck of Python performance. Le lun. 29 juin 2020 à 23:37, Raymond Hettinger <raymond.hettinger@gmail.com> a écrit :
$ python3.8 -m timeit -s 'from collections import namedtuple' -s 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; p.x; p.y' 2000000 loops, best of 5: 119 nsec per loop
$ python3.9 -m timeit -s 'from collections import namedtuple' -s 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; p.x; p.y' 2000000 loops, best of 5: 152 nsec per loop
Measuring benchmarks which take less than 1 second requires being very careful. For a microbenchmark which takes around 100 ns like this one, you are very close to the CPU limit and "everything" becomes important. Python performance depends on the C compiler, on compiler options, how you run the microbenchmark, if --enable-shared is used, etc. Giving microbenchmark results without these information isn't helpful. On Fedora 32, Python binaries are built by GCC with Link Time Optimization (LTO) and Profile Guided Optimization (PGO). I simply get the same performance between Python 3.8.3 and Python 3.9.0b3: $ python3.9 -m pyperf timeit --compare-to=python3.8 -s 'from collections import namedtuple' -s 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; p.x; p.y' python3.8: ..................... 138 ns +- 2 ns python3.9: ..................... 136 ns +- 3 ns Mean +- std dev: [python3.8] 138 ns +- 2 ns -> [python3.9] 136 ns +- 3 ns: 1.01x faster (-1%) (A difference smaller than 10% on a microbenchmark is not significant.) The compiler decides to inline or not a static inline function depending on many complex things. I don't think that there is any need to elaborate here. The idea to force inlining was discussed but rejected when first C API macros have been converted to static inline functions: https://bugs.python.org/issue35059 C compilers are now really smart to emit the most efficient machine code. By the way, if you configure Python with --enable-shared, function calls from libpython to libpython have to go through a procedure linkage table (PLT) indirection. Python 3.8 and 3.9, on Fedora 32 and Python 3.8 on RHEL8 are built with -fno-semantic-interposition to avoid this indirection and so make Python faster. More about this linker flag: https://developers.redhat.com/blog/2020/06/25/red-hat-enterprise-linux-8-2-b... Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Jun 29, 2020, at 5:46 PM, Victor Stinner <vstinner@python.org> wrote:
You missed the point of the PEP: "It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations, like tagged pointers."
IMHO it's time to stop wasting our limited developer resources on micro-optimizations and micro-benchmarks, but think about overall Python performance and major Python internals redesign to find a way to make Python overall 2x faster, rather than making a specific function 10% faster.
That is a really bold claim. AFAICT there is zero evidence that this actually possible. Like the sandboxing project, these experiments may all prove to be dead-ends. If we're going to bet the farm on this, there should at least be a proof-of-concept. Otherwise, it's just an expensive lottery ticket.
I don't think that the performance of accessing namedtuple attributes is a known bottleneck of Python performance.
This time you missed the point. Named tuple access was just one point of impact — it is not the only code that calls PyTuple_Check(). It looks like inlining did not work and that EVERY SINGLE type check in CPython was affected (including third party extensions). Also, there was no review — we have a single developer pushing through hundreds of these changes at a rate where no one else can keep up.
Measuring benchmarks which take less than 1 second requires being very careful.
Perhaps you don't want to believe the results, but the timings are careful, stable, repeatable, and backed-up by a disassembly that shows the exact cause. The builds used for the timings were the production macOS builds as distributed on python.org. There is a certain irony in making repeated, unsubstantiated promises to make the language 2x faster and then checking in changes that make the implementation slower. Raymond P.S. What PyPy achieved was monumental. But it took a decade even with a well-organized and partially-funded team of superstars. It always lagged CPython in features. And the results were entirely dependent on a single design decision to run a pure python interpreter written in rpython to take advantage of its tracing JIT. I don't imagine CPython can hope to achieve anything like this. Likely, the best we can do is replace reference counting with garbage collection.
On Mon, Jun 29, 2020 at 7:41 PM Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
Perhaps you don't want to believe the results, but the timings are careful, stable, repeatable, and backed-up by a disassembly that shows the exact cause. The builds used for the timings were the production macOS builds as distributed on python.org.
This points more to specific builds needing to be fixed, if their build options result in significantly un-optimized code on the same cpu architecture.
On Mon, 29 Jun 2020 23:31:31 -0700 Emily Bowman <silverbacknet@gmail.com> wrote:
On Mon, Jun 29, 2020 at 7:41 PM Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
Perhaps you don't want to believe the results, but the timings are careful, stable, repeatable, and backed-up by a disassembly that shows the exact cause. The builds used for the timings were the production macOS builds as distributed on python.org.
This points more to specific builds needing to be fixed, if their build options result in significantly un-optimized code on the same cpu architecture.
I agree in this specific instance _PyType_GetFlags() should definitely get inlined. The whole point of the type check flags is to make instance checks for built-in types such as long or tuple faster than by the regular algorithm. Regards Antoine.
On 2020-06-30 02:46, Victor Stinner wrote:
You missed the point of the PEP: "It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations, like tagged pointers."
I don't think experiments are a good motivation. When the C API is broken, everyone that uses it pays the price -- they have to update their code. They pay the price even if the experiment fails, or if it's never started in the first place. Can we treat the C API not as a place for experiments, but as a stable foundation to build on? For example, could we only deprecate the bad parts, but not remove them until the experiments actually show that they are preventing a beneficial change?
On Tue, Jun 30, 2020 at 6:45 AM Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Converting macros to static inline functions should only impact very few C extensions which use macros in unusual ways.
These should be individually verified to make sure they actually get inlined by the compiler. In https://bugs.python.org/issue39542 about nine PRs were applied without review or discussion. One of those, https://github.com/python/cpython/pull/18364 , converted PyType_Check() to static inline function but I'm not sure that it actually does get inlined. That may be the reason named tuple attribute access slowed by about 25% between Python 3.8 and Python 3.9.¹ Presumably, that PR also affected every single type check in the entire C codebase and will affect third-party extensions as well.
I confirmed the performance regression, although the difference is 12%. And I find the commit cause the regression. https://github.com/python/cpython/commit/45ec5b99aefa54552947049086e87ec01bc... https://bugs.python.org/issue40170 The regression is not caused by "static inline" function is not inlined by compiler. The commit changed PyType_HasFeature to call regular function PyType_GetFlags always. Regards, -- Inada Naoki <songofacandy@gmail.com>
Le mer. 1 juil. 2020 à 03:53, Inada Naoki <songofacandy@gmail.com> a écrit :
I confirmed the performance regression, although the difference is 12%. And I find the commit cause the regression.
https://github.com/python/cpython/commit/45ec5b99aefa54552947049086e87ec01bc... https://bugs.python.org/issue40170
The regression is not caused by "static inline" function is not inlined by compiler. The commit changed PyType_HasFeature to call regular function PyType_GetFlags always.
On Fedora 32 with GCC 10.1.1, even if PyType_GetFlags() is a function, the function call is inlined. This is thanks to LTO (and -fno-semantic-interposition, since Fedora builds Python with --enable-shared, which is not the case for the macOS installer). The python.org macOS installers of Python 3.8.3 and Python 3.9.0b3 are *not* built with LTO or PGO: see Mac/BuildScript/build-installer.py. LTO and PGO can make Python between 10 and 30% faster, it's very significant. I created https://bugs.python.org/issue41181 with a PR to enable LTO and PGO in the script building the macOS installer. I confirm that using LTO+PGO, clang also inlines the PyType_GetFlags() function call in tuplegetter_descr_get(): https://bugs.python.org/issue41181#msg372744 Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Wed, 1 Jul 2020 12:50:01 +0200 Victor Stinner <vstinner@python.org> wrote:
Le mer. 1 juil. 2020 à 03:53, Inada Naoki <songofacandy@gmail.com> a écrit :
I confirmed the performance regression, although the difference is 12%. And I find the commit cause the regression.
https://github.com/python/cpython/commit/45ec5b99aefa54552947049086e87ec01bc... https://bugs.python.org/issue40170
The regression is not caused by "static inline" function is not inlined by compiler. The commit changed PyType_HasFeature to call regular function PyType_GetFlags always.
On Fedora 32 with GCC 10.1.1, even if PyType_GetFlags() is a function, the function call is inlined. This is thanks to LTO
How does this help third-party extensions?
On Tue, Jun 30, 2020 at 3:09 PM Petr Viktorin <encukou@gmail.com> wrote:
On 2020-06-30 02:46, Victor Stinner wrote:
You missed the point of the PEP: "It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations, like tagged pointers."
I don't think experiments are a good motivation.
When the C API is broken, everyone that uses it pays the price -- they have to update their code. They pay the price even if the experiment fails, or if it's never started in the first place.
Can we treat the C API not as a place for experiments, but as a stable foundation to build on?
The C API should indeed not be a place to experiment, but the current C API leaks so many implementation details, it becomes impossible to have meaningful experiments with different implementations. For example, the purpose of the experiments I've done is to improve the implementation of Python -- speed, memory use, threading, copy-on-write behaviour in forked processes, etc -- in real world applications, with real-world third-party dependencies. The specific things Victor is aiming to change are an active impediment in those experiments, because I end up having to change a lot of third-party code to be able to build or run the application, and even if I find a beneficial change, I can't actually use it without incurring a massive maintenance burden.
For example, could we only deprecate the bad parts, but not remove them until the experiments actually show that they are preventing a beneficial change?
Well, right now we can show that they're impacting the *search* for a beneficial change rather negatively :) And what's worse, it will take many years to actually get rid of the old APIs, so if we do the work only once we have a specific beneficial change, we won't be able to *use* the beneficial change until much, much later. I would also argue that not leaking implementation details (in the API, not the ABI) makes for much better code all-round. And yes, there will be beneficial changes that require changing the API that we can't anticipate -- that's fine. The thing is we can stop leaking implementation details now and make everyone's life easier in the future. The main matter is the question of the *cost* of these changes, and how we measure it. I agree with Victor that for this purpose, we should measure the run-time cost in meaningful, real-world benchmarks (not micro-benchmarks) in sensibly optimised builds. The impact of using PGO, LTO and no-semantic-interpositioning is so big, it doesn't make sense to care about 5% performance cost if you *aren't* using them. The cost on developers of CPython and of third-party libraries is a different matter, and something we can surely debate. We should agree on whether the change is of benefit to CPython first, though.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FVATSACB... Code of Conduct: http://python.org/psf/codeofconduct/
-- Thomas Wouters <thomas@python.org> Hi! I'm an email virus! Think twice before sending your email to help me spread!
On Wed, Jul 1, 2020 at 4:09 AM Antoine Pitrou <solipsis@pitrou.net> wrote:
How does this help third-party extensions?
If the cost is high enough, exposing the guts of a function to allow the compiler to inline it is not unreasonable; all of the major compilers have ways to inline things that are technically across a dynamic boundary, if you declare them properly. The trade-off is accepting that it may have to be rebuilt at some point, but an ABI change is less painful than an API change. Relying on this kind of "optimized interface" would have to be opt-in, since most won't need it and it would be actively harmful for portability. Actually, that's probably an attractive nuisance, better to just make extension writers copy-paste the function into their own codebase if they need performance that badly for a specific call. Better to identify an actual existing case that's degraded by the change and can be tested against than a hypothetical, though. The macos build is already getting fixed thanks to that!
On 7/1/2020 3:43 PM, Stefan Behnel wrote:
Petr Viktorin schrieb am 30.06.20 um 14:51:
For example, could we only deprecate the bad parts, but not remove them until the experiments actually show that they are preventing a beneficial change? Big nod on this one.
At one of the core sprints (maybe at Microsoft?) there was talk of adding a new API without changing the existing one. Eric
There is the https://github.com/pyhandle/hpy project which is implemented on top of the existing C API. But this project doesn't solve problems listed in PEP 620, since CPython must continue to support existing C extensions. Victor Le mer. 1 juil. 2020 à 23:43, Eric V. Smith <eric@trueblade.com> a écrit :
On 7/1/2020 3:43 PM, Stefan Behnel wrote:
Petr Viktorin schrieb am 30.06.20 um 14:51:
For example, could we only deprecate the bad parts, but not remove them until the experiments actually show that they are preventing a beneficial change? Big nod on this one.
At one of the core sprints (maybe at Microsoft?) there was talk of adding a new API without changing the existing one.
Eric _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PF3ZWLIP... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
Le mer. 1 juil. 2020 à 23:43, Eric V. Smith a écrit :
On 7/1/2020 3:43 PM, Stefan Behnel wrote:
Petr Viktorin schrieb am 30.06.20 um 14:51:
For example, could we only deprecate the bad parts, but not remove them until the experiments actually show that they are preventing a beneficial change? Big nod on this one.
At one of the core sprints (maybe at Microsoft?) there was talk of adding a new API without changing the existing one.
There is the https://github.com/pyhandle/hpy project which is implemented on top of the existing C API.
But this project doesn't solve problems listed in PEP 620, since CPython must continue to support existing C extensions. Maybe I'm missing something here, but how is "removing parts of the C-API"
Victor Stinner schrieb am 02.07.20 um 00:07: the same as "supporting existing C extensions" ? It seems to me that both are straight opposites. Stefan
On 7/2/2020 11:36 AM, Stefan Behnel wrote:
Le mer. 1 juil. 2020 à 23:43, Eric V. Smith a écrit :
On 7/1/2020 3:43 PM, Stefan Behnel wrote:
Petr Viktorin schrieb am 30.06.20 um 14:51:
For example, could we only deprecate the bad parts, but not remove them until the experiments actually show that they are preventing a beneficial change? Big nod on this one. At one of the core sprints (maybe at Microsoft?) there was talk of adding a new API without changing the existing one.
There is the https://github.com/pyhandle/hpy project which is implemented on top of the existing C API.
But this project doesn't solve problems listed in PEP 620, since CPython must continue to support existing C extensions. Maybe I'm missing something here, but how is "removing parts of the C-API"
Victor Stinner schrieb am 02.07.20 um 00:07: the same as "supporting existing C extensions" ? It seems to me that both are straight opposites.
Agreed. I thought the discussion was "in CPython, leave the existing C-API alone, but experiment with new APIs, and then maybe someday deprecate the existing C-API". I could see conditionally disabling the existing C-API if doing so was needed to do something like experimenting with remove reference counting. But for the foreseeable future, we'd ship with the existing C-API until we'd determined a significant benefit to dropping it. Eric
participants (13)
-
Antoine Pitrou
-
Carl Shapiro
-
Dong-hee Na
-
Emily Bowman
-
Eric V. Smith
-
Gustavo Carneiro
-
Inada Naoki
-
Neil Schemenauer
-
Petr Viktorin
-
Raymond Hettinger
-
Stefan Behnel
-
Thomas Wouters
-
Victor Stinner