Thank you very much for putting this PEP together.

It would be very helpful to broaden the objective of avoiding functions returning PyObject** to other types of pointers.  I have in mind several functions in the C-API that return a char* pointer to the contents of an object.  While these functions are easy to implement on top of the CPython object model they are challenging for alternative Python implementations.

Consider PyBytes_AsString: it returns a mutable char* pointing to the contents of a byte instance.  This presents several obvious problems.  For starters, it burdens a relocating garbage collector to pin objects or create a temporary copy of an object's contents in non-moving memory.  It also has implications for treating PyObejct* as a handle, using tagged pointers (and tagged immediates), and multi-threading.

To eliminate C-API functions such as PyBytes_AsString, PyUnicode_AsUTF8, etc., new functions should be added to the C-API that copy the contents of objects out into a buffer, similar to PyUnicode_AsUCS4 or to return the contents in an dynamically allocated buffer like PyUnicode_AsUCS4Copy.

On Mon, Jun 22, 2020 at 5:12 AM Victor Stinner <> wrote:

PEP available at:

This PEP is the result of 4 years of research work on the C API:

It's the third version. The first version (2017) proposed to add a
"new C API" and advised C extensions maintainers to opt-in for it: it
was basically the same idea as PEP 384 limited C API but in a
different color. Well, I had no idea of what I was doing :-) The
second version (April 2020) proposed to add a new Python runtime built
from the same code base as the regular Python runtime but in a
different build mode, the regular Python would continue to be fully

I wrote the third version, the PEP 620, from scratch. It now gives an
explicit and concrete list of incompatible C API changes, and has
better motivation and rationale sections. The main PEP novelty is the
new pythoncapi_compat.h header file distributed with Python to provide
new C API functions to old Python versions, the second novelty is the
process to reduce the number of broken C extensions.

Whereas PEPs are usually implemented in a single Python version, the
implementation of this PEP is expected to be done carefully over
multiple Python versions. The PEP lists many changes which are already
implemented in Python 3.7, 3.8 and 3.9. It defines a process to reduce
the number of broken C extensions when introducing the incompatible C
API changes listed in the PEP. The process dictates the rhythm of
these changes.

PEP: 620
Title: Hide implementation details from the C API
Author: Victor Stinner <>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 19-June-2020
Python-Version: 3.10


Introduce C API incompatible changes to hide implementation details.

Once most implementation details will be hidden, evolution of CPython
internals would be less limited by C API backward compatibility issues.
It will be way easier to add new features.

It becomes possible to experiment with more advanced optimizations in CPython
than just micro-optimizations, like tagged pointers.

Define a process to reduce the number of broken C extensions.

The implementation of this PEP is expected to be done carefully over
multiple Python versions. It already started in Python 3.7 and most
changes are already completed. The `Process to reduce the number of
broken C extensions`_ dictates the rhythm.


The C API blocks CPython evolutions

Adding or removing members of C structures is causing multiple backward
compatibility issues.

Adding a new member breaks the stable ABI (PEP 384), especially for
types declared statically (e.g. ``static PyTypeObject MyType =
{...};``). In Python 3.4, the PEP 442 "Safe object finalization" added
the ``tp_finalize`` member at the end of the ``PyTypeObject`` structure.
For ABI backward compatibility, a new ``Py_TPFLAGS_HAVE_FINALIZE`` type
flag was required to announce if the type structure contains the
``tp_finalize`` member. The flag was removed in Python 3.8 (`bpo-32388

The ``PyTypeObject.tp_print`` member, deprecated since Python 3.0
released in 2009, has been removed in the Python 3.8 development cycle.
But the change broke too many C extensions and had to be reverted before
3.8 final release. Finally, the member was removed again in Python 3.9.

C extensions rely on the ability to access directly structure members,
indirectly through the C API, or even directly. Modifying structures
like ``PyListObject`` cannot be even considered.

The ``PyTypeObject`` structure is the one which evolved the most, simply
because there was no other way to evolve CPython than modifying it.

In the C API, all Python objects are passed as ``PyObject*``: a pointer
to a ``PyObject`` structure. Experimenting tagged pointers in CPython is
blocked by the fact that a C extension can technically dereference a
``PyObject*`` pointer and access ``PyObject`` members. Small "objects"
can be stored as a tagged pointer with no concrete ``PyObject``

Replacing Python garbage collector with a tracing garbage collector
would also need to remove ``PyObject.ob_refcnt`` reference counter,
whereas currently ``Py_INCREF()`` and ``Py_DECREF()`` macros access
directly to ``PyObject.ob_refcnt``.

Same CPython design since 1990: structures and reference counting

When the CPython project was created, it was written with one principle:
keep the implementation simple enough so it can be maintained by a
single developer. CPython complexity grew a lot and many
micro-optimizations have been implemented, but CPython core design has
not changed.

Members of ``PyObject`` and ``PyTupleObject`` structures have not
changed since the "Initial revision" commit (1990)::

    #define OB_HEAD \
        unsigned int ob_refcnt; \
        struct _typeobject *ob_type;

    typedef struct _object {
    } object;

    typedef struct {
        object *ob_item[1];
    } tupleobject;

Only names changed: ``object`` was renamed to ``PyObject`` and
``tupleobject`` was renamed to ``PyTupleObject``.

CPython still tracks Python objects lifetime using reference counting
internally and for third party C extensions (through the Python C API).

All Python objects must be allocated on the heap and cannot be moved.

Why is PyPy more efficient than CPython?

The PyPy project is a Python implementation which is 4.2x faster than
CPython on average. PyPy developers chose to not fork CPython, but start
from scratch to have more freedom in terms of optimization choices.

PyPy does not use reference counting, but a tracing garbage collector
which moves objects. Objects can be allocated on the stack (or even not
at all), rather than always having to be allocated on the heap.

Objects layouts are designed with performance in mind. For example, a
list strategy stores integers directly as integers, rather than objects.

Moreover, PyPy also has a JIT compiler which emits fast code thanks to
the efficient PyPy design.

PyPy bottleneck: the Python C API

While PyPy is way more efficient than CPython to run pure Python code,
it is as efficient or slower than CPython to run C extensions.

Since the C API requires ``PyObject*`` and allows to access directly
structure members, PyPy has to associate a CPython object to PyPy
objects and maintain both consistent. Converting a PyPy object to a
CPython object is inefficient. Moreover, reference counting also has to
be implemented on top of PyPy tracing garbage collector.

These conversions are required because the Python C API is too close to
the CPython implementation: there is no high-level abstraction.
For example, structures members are part of the public C API and nothing
prevents a C extension to get or set directly
``PyTupleObject.ob_item[0]`` (the first item of a tuple).

See `Inside cpyext: Why emulating CPython C API is so Hard
(Sept 2018) by Antonio Cuni for more details.


Hide implementation details

Hiding implementation details from the C API has multiple advantages:

* It becomes possible to experiment with more advanced optimizations in
  CPython than just micro-optimizations. For example, tagged pointers,
  and replace the garbage collector with a tracing garbage collector
  which can move objects.
* Adding new features in CPython becomes easier.
* PyPy should be able to avoid conversions to CPython objects in more
  cases: keep efficient PyPy objects.
* It becomes easier to implement the C API for a new Python
* More C extensions will be compatible with Python implementations other
  than CPython.

Relationship with the limited C API

The PEP 384 "Defining a Stable ABI" is in Python 3.4. It introduces the
"limited C API": a subset of the C API. When the limited C API is used,
it becomes possible to build a C extension only once and use it on
multiple Python versions: that's the stable ABI.

The main limitation of the PEP 384 is that C extensions have to opt-in
for the limited C API. Only very few projects made this choice,
usually to ease distribution of binaries, especially on Windows.

This PEP moves the C API towards the limited C API.

Ideally, the C API will become the limited C API and all C extensions
will use the stable ABI, but this is out of this PEP scope.



* (**Completed**) Reorganize the C API header files: create
``Include/cpython/`` and
  ``Include/internal/`` subdirectories.
* (**Completed**) Move private functions exposing implementation
details to the internal
  C API.
* (**Completed**) Convert macros to static inline functions.
* (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and
  ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and
  ``Py_SIZE()`` macros become functions which cannot be used as l-value.
* (**Completed**) New C API functions must not return borrowed
* (**In Progress**) Provide ``pythoncapi_compat.h`` header file.
* (**In Progress**) Make structures opaque, add getter and setter
* (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``.
* (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and
  ``PyList_GET_ITEM()`` macros to static inline functions.

Reorganize the C API header files

The first consumer of the C API was Python itself. There is no clear
separation between APIs which must not be used outside Python, and API
which are public on purpose.

Header files must be reorganized in 3 API:

* ``Include/`` directory is the limited C API: no implementation
  details, structures are opaque. C extensions using it get a stable
* ``Include/cpython/`` directory is the CPython C API: less "portable"
  API, depends more on the Python version, expose some implementation
  details, few incompatible changes can happen.
* ``Internal/internal/`` directory is the internal C API: implementation
  details, incompatible changes are likely at each Python release.

The creation of the ``Include/cpython/`` directory is fully backward
compatible. ``Include/cpython/`` header files cannot be included
directly and are included automatically by ``Include/`` header files
when the ``Py_LIMITED_API`` macro is not defined.

The internal C API is installed and can be used for specific usage like
debuggers and profilers which must access structures members without
executing code. C extensions using the internal C API are tightly
coupled to a Python version and must be recompiled at each Python

**STATUS**: Completed (in Python 3.8)

The reorganization of header files started in Python 3.7 and was
completed in Python 3.8:

* `bpo-35134 <>`_: Add a new
  Include/cpython/ subdirectory for the "CPython API" with
  implementation details.
* `bpo-35081 <>`_: Move internal
  headers to ``Include/internal/``

Move private functions to the internal C API

Private functions which expose implementation details must be moved to
the internal C API.

If a C extension relies on a CPython private function which exposes
CPython implementation details, other Python implementations have to
re-implement this private function to support this C extension.

**STATUS**: Completed (in Python 3.9)

Private functions moved to the internal C API in Python 3.8:

* ``_PyObject_GC_TRACK()``, ``_PyObject_GC_UNTRACK()``

Macros and functions excluded from the limited C API in Python 3.9:

* ``_PyObject_SIZE()``, ``_PyObject_VAR_SIZE()``
* ``PyThreadState_DeleteCurrent()``
* ``_Py_NewReference()``, ``_Py_ForgetReference()``
* ``_PyTraceMalloc_NewReference()``
* ``_Py_GetRefTotal()``

Private functions moved to the internal C API in Python 3.9:

* GC functions like ``_Py_AS_GC()``, ``_PyObject_GC_IS_TRACKED()``
  and ``_PyGCHead_NEXT()``
* ``_Py_AddToAllObjects()`` (not exported)
* ``_PyDebug_PrintTotalRefs()``, ``_Py_PrintReferences()``,
  ``_Py_PrintReferenceAddresses()`` (not exported)

Public "clear free list" functions moved to the internal C API an
renamed to private functions and in Python 3.9:

* ``PyAsyncGen_ClearFreeLists()``
* ``PyContext_ClearFreeList()``
* ``PyDict_ClearFreeList()``
* ``PyFloat_ClearFreeList()``
* ``PyFrame_ClearFreeList()``
* ``PyList_ClearFreeList()``
* ``PyTuple_ClearFreeList()``
* Functions simply removed:

  * ``PyMethod_ClearFreeList()`` and ``PyCFunction_ClearFreeList()``:
    bound method free list removed in Python 3.9.
  * ``PySet_ClearFreeList()``: set free list removed in Python 3.4.
  * ``PyUnicode_ClearFreeList()``: Unicode free list removed
    in Python 3.3.

Convert macros to static inline functions

Converting macros to static inline functions have multiple advantages:

* Functions have well defined parameter types and return type.
* Functions can use variables with a well defined scope (the function).
* Debugger can be put breakpoints on functions and profilers can display
  the function name in the call stacks. In most cases, it works even
  when a static inline function is inlined.
* Functions don't have `macros pitfalls

Converting macros to static inline functions should only impact very few
C extensions which use macros in unusual ways.

For backward compatibility, functions must continue to accept any type,
not only ``PyObject*``, to avoid compiler warnings, since most macros
cast their parameters to ``PyObject*``.

Python 3.6 requires C compilers to support static inline functions: the
PEP 7 requires a subset of C99.

**STATUS**: Completed (in Python 3.9)

Macros converted to static inline functions in Python 3.8:

* ``Py_INCREF()``, ``Py_DECREF()``
* ``Py_XINCREF()``, ``Py_XDECREF()``
* ``PyObject_INIT()``, ``PyObject_INIT_VAR()``
* ``_PyObject_GC_TRACK()``, ``_PyObject_GC_UNTRACK()``, ``_Py_Dealloc()``

Macros converted to regular functions in Python 3.9:

* ``Py_EnterRecursiveCall()``, ``Py_LeaveRecursiveCall()``
  (added to the limited C API)
* ``PyObject_INIT()``, ``PyObject_INIT_VAR()``
* ``PyObject_CheckBuffer()``
* ``PyIndex_Check()``
* ``PyObject_IS_GC()``
* ``PyObject_NEW()`` (alias to ``PyObject_New()``),
  ``PyObject_NEW_VAR()`` (alias to ``PyObject_NewVar()``)
* ``PyType_HasFeature()`` (always call ``PyType_GetFlags()``)
  now call functions which hide implementation details, rather than
  accessing directly members of the ``PyThreadState`` structure.

Make structures opaque

All structures of the C API should become opaque: C extensions must
use getter or setter functions to get or set structure members. For
example, ``tuple->ob_item[0]`` must be replaced with
``PyTuple_GET_ITEM(tuple, 0)``.

To be able to move away from reference counting, ``PyObject`` must
become opaque. Currently, the reference counter ``PyObject.ob_refcnt``
is exposed in the C API. All structures must become opaque, since they
"inherit" from PyObject. For, ``PyFloatObject`` inherits from

    typedef struct {
        PyObject ob_base;
        double ob_fval;
    } PyFloatObject;

Making ``PyObject`` fully opaque requires converting ``Py_INCREF()`` and
``Py_DECREF()`` macros to function calls. This change has an impact on
performance. It is likely to be one of the very last changes when making
structures opaque.

Making ``PyTypeObject`` structure opaque breaks C extensions declaring
types statically (e.g. ``static PyTypeObject MyType = {...};``). C
extensions must use ``PyType_FromSpec()`` to allocate types on the heap
instead. Using heap types has other advantages like being compatible
with subinterpreters. Combined with PEP 489 "Multi-phase extension
module initialization", it makes a C extension behavior closer to a
Python module, like allowing to create more than one module instance.

Making ``PyThreadState`` structure opaque requires adding getter and
setter functions for members used by C extensions.

**STATUS**: In Progress (started in Python 3.8)

The ``PyInterpreterState`` structure was made opaque in Python 3.8
(`bpo-35886 <>`_) and the
``PyGC_Head`` structure (`bpo-40241
<>`_) was made opaque in Python 3.9.

Issues tracking the work to prepare the C API to make following
structures opaque:

* ``PyObject``: `bpo-39573 <>`_
* ``PyTypeObject``: `bpo-40170 <>`_
* ``PyFrameObject``: `bpo-40421 <>`_

  * Python 3.9 adds ``PyFrame_GetCode()`` and ``PyFrame_GetBack()``
    getter functions, and moves ``PyFrame_GetLineNumber`` to the limited
    C API.

* ``PyThreadState``: `bpo-39947 <>`_

  * Python 3.9 adds 3 getter functions: ``PyThreadState_GetFrame()``,
    ``PyThreadState_GetID()``, ``PyThreadState_GetInterpreter()``.

Disallow using Py_TYPE() as l-value

The ``Py_TYPE()`` function gets an object type, its ``PyObject.ob_type``
member. It is implemented as a macro which can be used as an l-value to
set the type: ``Py_TYPE(obj) = new_type``. This code relies on the
assumption that ``PyObject.ob_type`` can be modified directly. It
prevents making the ``PyObject`` structure opaque.

New setter functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and
``Py_SET_SIZE()`` are added and must be used instead.

The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros must be
converted to static inline functions which can not be used as l-value.

For example, the ``Py_TYPE()`` macro::

    #define Py_TYPE(ob)             (((PyObject*)(ob))->ob_type)


    #define _PyObject_CAST_CONST(op) ((const PyObject*)(op))

    static inline PyTypeObject* _Py_TYPE(const PyObject *ob) {
        return ob->ob_type;

    #define Py_TYPE(ob) _Py_TYPE(_PyObject_CAST_CONST(ob))

**STATUS**: Completed (in Python 3.10)

New functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and
``Py_SET_SIZE()`` were added to Python 3.9.

In Python 3.10, ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` can no
longer be used as l-value and the new setter functions must be used

New C API functions must not return borrowed references

When a function returns a borrowed reference, Python cannot track when
the caller stops using this reference.

For example, if the Python ``list`` type is specialized for small
integers, store directly "raw" numbers rather than Python objects,
``PyList_GetItem()`` has to create a temporary Python object. The
problem is to decide when it is safe to delete the temporary object.

The general guidelines is to avoid returning borrowed references for new
C API functions.

No function returning borrowed functions is scheduled for removal by
this PEP.

**STATUS**: Completed (in Python 3.9)

In Python 3.9, new C API functions returning Python objects only return
strong references:

* ``PyFrame_GetBack()``
* ``PyFrame_GetCode()``
* ``PyObject_CallNoArgs()``
* ``PyObject_CallOneArg()``
* ``PyThreadState_GetFrame()``

Avoid functions returning PyObject**

The ``PySequence_Fast_ITEMS()`` function gives a direct access to an
array of ``PyObject*`` objects. The function is deprecated in favor of
``PyTuple_GetItem()`` and ``PyList_GetItem()``.

``PyTuple_GET_ITEM()`` can be abused to access directly the
``PyTupleObject.ob_item`` member::

    PyObject **items = &PyTuple_GET_ITEM(0);

The ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros are
converted to static inline functions to disallow that.

**STATUS**: Not Started

New pythoncapi_compat.h header file

Making structures opaque requires modifying C extensions to
use getter and setter functions. The practical issue is how to keep
support for old Python versions which don't have these functions.

For example, in Python 3.10, it is no longer possible to use
``Py_TYPE()`` as an l-value. The new ``Py_SET_TYPE()`` function must be
used instead::

    #if PY_VERSION_HEX >= 0x030900A4
        Py_SET_TYPE(&MyType, &PyType_Type);
        Py_TYPE(&MyType) = &PyType_Type;

This code may ring a bell to developers who ported their Python code
base from Python 2 to Python 3.

Python will distribute a new ``pythoncapi_compat.h`` header file which
provides new C API functions to old Python versions. Example::

    #if PY_VERSION_HEX < 0x030900A4
    static inline void
    _Py_SET_TYPE(PyObject *ob, PyTypeObject *type)
        ob->ob_type = type;
    #define Py_SET_TYPE(ob, type) _Py_SET_TYPE((PyObject*)(ob), type)
    #endif  // PY_VERSION_HEX < 0x030900A4

Using this header file, ``Py_SET_TYPE()`` can be used on old Python
versions as well.

Developers can copy this file in their project, or even to only
copy/paste the few functions needed by their C extension.

**STATUS**: In Progress (implemented but not distributed by CPython yet)

The ``pythoncapi_compat.h`` header file is currently developer at:

Process to reduce the number of broken C extensions

Process to reduce the number of broken C extensions when introducing C
API incompatible changes listed in this PEP:

* Estimate how many popular C extensions are affected by the
  incompatible change.
* Coordinate with maintainers of broken C extensions to prepare their
  code for the future incompatible change.
* Introduce the incompatible changes in Python. The documentation must
  explain how to port existing code. It is recommended to merge such
  changes at the beginning of a development cycle to have more time for
* Changes which are the most likely to break a large number of C
  extensions should be announced on the capi-sig mailing list to notify
  C extensions maintainers to prepare their project for the next Python.
* If the change breaks too many projects, reverting the change should be
  discussed, taking in account the number of broken packages, their
  importance in the Python community, and the importance of the change.

The coordination usually means reporting issues to the projects, or even
proposing changes. It does not require waiting for a new release including
fixes for every broken project.

Since more and more C extensions are written using Cython, rather
directly using the C API, it is important to ensure that Cython is
prepared in advance for incompatible changes. It gives more time for C
extension maintainers to release a new version with code generated with
the updated Cython (for C extensions distributing the code generated by

Future incompatible changes can be announced by deprecating a function
in the documentation and by annotating the function with
``Py_DEPRECATED()``. But making a structure opaque and preventing the
usage of a macro as l-value cannot be deprecated with

The important part is coordination and finding a balance between CPython
evolutions and backward compatibility. For example, breaking a random,
old, obscure and unmaintained C extension on PyPI is less severe than
breaking numpy.

If a change is reverted, we move back to the coordination step to better
prepare the change. Once more C extensions are ready, the incompatible
change can be reconsidered.

Version History

* Version 3, June 2020: PEP rewritten from scratch. Python now
  distributes a new ``pythoncapi_compat.h`` header and a process is
  defined to reduce the number of broken C extensions when introducing C
  API incompatible changes listed in this PEP.
* Version 2, April 2020:
  `PEP: Modify the C API to hide implementation details
* Version 1, July 2017:
  `PEP: Hide implementation details in the C API
  sent to python-ideas


This document has been placed in the public domain.

Night gathers, and now my watch begins. It shall not end until my death.
Python-Dev mailing list --
To unsubscribe send an email to
Message archived at
Code of Conduct: