PEP: Modify the C API to hide implementation details
Hi, Here is a first draft a PEP which summarize the research work I'm doing on CPython C API since 2017 and the changes that me and others already made since Python 3.7 towards an "opaque" C API. The PEP is also a collaboration with developers of PyPy, HPy, Rust-CPython and many others! Thanks to everyone who helped me to write it down! Maybe this big document should be reorganized as multiple smaller better defined goals: as multiple PEPs. The PEP is quite long and talks about things which are not directly related. It's a complex topic and I chose to put everything as a single document to have a good starting point to open the discussion. I already proposed some of these ideas in 2017: see the Prior Art section ;-) The PEP can be read on GitHub where it's better formatted: https://github.com/vstinner/misc/blob/master/cpython/pep-opaque-c-api.rst If someone wants to work on the PEP itself, the document on GitHub is the current reference. Victor ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ PEP xxx: Modify the C API to hide implementation details ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Abstract ======== * Hide implementation details from the C API to be able to `optimize CPython`_ and make PyPy more efficient. * The expectation is that `most C extensions don't rely directly on CPython internals`_ and so will remain compatible. * Continue to support old unmodified C extensions by continuing to provide the fully compatible "regular" CPython runtime. * Provide a `new optimized CPython runtime`_ using the same CPython code base: faster but can only import C extensions which don't use implementation details. Since both CPython runtimes share the same code base, features implemented in CPython will be available in both runtimes. * `Stable ABI`_: Only build a C extension once and use it on multiple Python runtimes and different versions of the same runtime. * Better advertise alternative Python runtimes and better communicate on the differences between the Python language and the Python implementation (especially CPython). Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython. Rationale ========= To remain competitive in term of performance with other programming languages like Go or Rust, Python has to become more efficient. Make Python (at least) two times faster --------------------------------------- The C API leaks too many implementation details which prevent optimizing CPython. See `Optimize CPython`_. PyPy's support for Python's C API (pycext) is slow because it has to emulate CPython internals like memory layout and reference counting. The emulation causes memory overhead, memory copies, conversions, etc. See `Inside cpyext: Why emulating CPython C API is so Hard <https://morepypy.blogspot.com/2018/09/inside-cpyext-why-emulating-cpython-c.html>`_ (Sept 2018) by Antonio Cuni. While this PEP may make CPython a little bit slower in the short term, the long-term goal is to make "Python" at least two times faster. This goal is not hypothetical: PyPy is already 4.2x faster than CPython and is fully compatible. C extensions are the bottleneck of PyPy. This PEP proposes a migration plan to move towards opaque C API which would make PyPy faster. Separated the Python language and the CPython runtime (promote alternative runtimes) ------------------------------------------------------------------------------------ The Python language should be better separated from its runtime. It's common to say "Python" when referring to "CPython". Even in this PEP :-) Because the CPython runtime remains the reference implementation, many people believe that the Python language itself has design flaws which prevent it from being efficient. PyPy proved that this is a false assumption: on average, PyPy runs Python code 4.2 times faster than CPython. One solution for separating the language from the implementation is to promote the usage of alternative runtimes: not only provide the regular CPython, but also PyPy, optimized CPython which is only compatible with C extensions using the limited C API, CPython compiled in debug mode to ease debugging issues in C extensions, RustPython, etc. To make alternative runtimes viable, they should be competitive in term of features and performance. Currently, C extension modules remain the bottleneck for PyPy. Most C extensions don't rely directly on CPython internals ---------------------------------------------------------- While the C API is still tidely coupled to CPython internals, in practical, most C extensions don't rely directly on CPython internals. The expectation is that these C extensions will remain compatible with an "opaque" C API and only a minority of C extensions will have to be modified. Moreover, more and more C extensions are implemented in Cython or cffi. Updating Cython and cffi to be compatible with the opaque C API will make all these C extensions without having to modify the source code of each extension. Stable ABI ---------- The idea is to build a C extension only once: the built binary will be usable on multiple Python runtimes and different versions of the same runtime (stable ABI). The idea is not new but is an extension of the `PEP 384: Defining a Stable ABI <https://www.python.org/dev/peps/pep-0384/>`__ implemented in CPython 3.4 with its "limited C API". The limited API is not used by default and is not widely used: PyQt is one of the only few known users. The idea here is that the default C API becomes the limited C API and so all C extensions will benefit of advantages of a stable ABI. Flaws of the C API ================== Borrowed references ------------------- A borrowed reference is a pointer which doesn't “hold” a reference. If the object is destroyed, the borrowed reference becomes a dangling pointer, pointing to freed memory which might be reused by a new object. Borrowed references can lead to bugs and crashes when misused. An example of a CPython bug caused by this is `bpo-25750: crash in type_getattro() <https://bugs.python.org/issue25750>`_. Borrowed references are a problem whenever there is no reference to borrow: they assume that a referenced object already exists (and thus has a positive reference count). Tagged pointers are an example of this problem: since there is no concrete ``PyObject*`` to represent the integer, it cannot easily be manipulated. This issue complicates optimizations like PyPy's list strategies: if a list contains only small integers, it is stored as a compact C array of longs. The equivalent of ``PyObject`` is only created when an item is accessed. (Most of the time the object is optimized away by the JIT, but this is another story.) This makes it hard to support the C API function ``PyList_GetItem()``, which should return a reference borrowed from the list, but the list contains no concrete ``PyObject`` that it could lend a reference to! PyPy's current solution is very bad: the first time ``PyList_GetItem()`` is called, the whole list is de-optimized (converted to a list of ``PyObject*``). See ``cpyext`` ``get_list_storage()``. See also the Specialized list use case, which is the same optimization applied to CPython. Like in PyPy, this optimization is incompatible with borrowed references since the runtime cannot guess when the temporary object should be destroyed. If ``PyList_GetItem()`` returned a strong reference, the ``PyObject*`` could just be allocated on the fly and destroyed when the user decrements its reference count. Basically, by putting borrowed references in the API, we are making it impossible to change the underlying data structure. Functions stealing strong references ------------------------------------ There are functions which steal strong references, for example ``PyModule_AddObject()`` and ``PySet_Discard()``. Stealing references is an issue similar to borrowed references. PyObject** ---------- Some functions of the C API return a pointer to an array of ``PyObject*``: * ``PySequence_Fast_ITEMS()`` * ``PyTuple_GET_ITEM()`` is sometimes abused to get an array of all of the tuple's contents: ``PyObject **items = &PyTuple_GET_ITEM(0);`` In effect, these functions return an array of borrowed references: like with ``PyList_GetItem()``, all callers of ``PySequence_Fast_ITEMS()`` assume the sequence holds references to its elements. Leaking structure members ------------------------- ``PyObject``, ``PyTypeObject``, ``PyThreadState``, etc. structures are currently public: C extensions can directly read and modify the structure members. For example, the ``Py_INCREF()`` macro directly increases ``PyObject.ob_refcnt``, without any abstraction. Hopefully, ``Py_INCREF()`` implementation can be modified without affecting the API. Change the C API ================ This PEP doesn't define an exhaustive list of all C API changes, but define some guidelines of bad patterns which should be avoided in the C API to prevent leaking implementation details. Separate header files of limited and internal C API --------------------------------------------------- In Python 3.6, all headers (.h files) were directly in the ``Include/`` directory. In Python 3.7, work started to move the internal C API into a new subdirectory, ``Include/internal/``. The work continued in Python 3.8 and 3.9. The internal C API is only partially exported: some functions are only declared with ``extern`` and so cannot be used outside CPython (with compilers supporting ``-fvisibility=hidden``, see above), whereas some functions are exported with ``PyAPI_FUNC()`` to make them usable in C extensions. Debuggers and profilers are typical users of the internal C API to inspect Python internals without calling functions (to inspect a coredump for example). Python 3.9 is now built with ``-fvisibility=hidden`` (supported by GCC and clang): symbols which are not declared with ``PyAPI_FUNC()`` or ``PyAPI_DATA()`` are no longer exported by the dynamical library (libpython). Another change is to separate the limited C API from the "CPython" C API: Python 3.8 has a new ``Include/cpython/`` sub-directory. It should not be used directly, but it is used automatically from the public headers when the ``Py_LIMITED_API`` macro is not defined. **Backward compatibility:** fully backward compatible. **Status:** basically completed in Python 3.9. Changes without API changes and with minor performance overhead --------------------------------------------------------------- * Replace macros with static inline functions. Work started in 3.8 and made good progress in Python 3.9. * Modify macros to avoid directly accessing structures fields. For example, the `Hide implementation detail of trashcan macros <https://github.com/python/cpython/commit/38965ec5411da60d312b59be281f3510d58e0cf1>`_ commit modifies ``Py_TRASHCAN_BEGIN_CONDITION()`` macro to call a new ``_PyTrash_begin()`` function rather than accessing directly ``PyThreadState.trash_delete_nesting`` field. **Backward compatibility:** fully backward compatible. **Status:** good progress in Python 3.9. Changes without API changes but with performance overhead --------------------------------------------------------- Replace macros or inline functions with regular functions. Work started in 3.9 on a limited set of functions. Converting macros to function calls can have a small overhead on performances. For example, ``Py_INCREF()`` macro modifies directly ``PyObject.ob_refcnt``: this macro would become an alias to the opaque ``Py_IncRef()`` function. It is possible that the regular CPython runtime keeps the ``Py_INCREF()`` macro which modifies directly ``PyObject.ob_refcnt`` to avoid any performance overhead. A tradeoff should be defined to limit differences between the regular and the new optimized CPython runtimes, without hurting too much performances of the regular CPython runtime. **Backward compatibility:** fully backward compatible. **Status:** not started. The performance overhead must be measured with benchmarks and this PEP should be accepted. API and ABI incompatible changes -------------------------------- * Make structures opaque: move them to the internal C API. * Remove functions from the public C API which are tied to CPython internals. Maybe begin by marking these functions as private (rename ``PyXXX`` to ``_PyXXX``) or move them to the internal C API. * Ban statically allocated types (by making ``PyTypeObject`` opaque): enforce usage of ``PyType_FromSpec()``. Examples of issues to make structures opaque: * ``PyGC_Head``: https://bugs.python.org/issue40241 * ``PyObject``: https://bugs.python.org/issue39573 * ``PyTypeObject``: https://bugs.python.org/issue40170 * ``PyThreadState``: https://bugs.python.org/issue39573 Another example are ``Py_REFCNT()`` and ``Py_TYPE()`` macros which can currently be used l-value to modify an object reference count or type. Python 3.9 has new ``Py_SET_REFCNT()`` and ``Py_SET_TYPE()`` macros which should be used instead. ``Py_REFCNT()`` and ``Py_TYPE()`` macros should be converted to static inline functions to prevent their usage as l-value. **Backward compatibility:** backward incompatible on purpose. Break the limited C API and the stable ABI, with the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible. CPython specific behavior ========================= Some C functions and some Python functions have a behavior which is closely tied to the current CPython implementation. is operator ----------- The "x is y" operator is closed tied to how CPython allocates objects and to ``PyObject*``. For example, CPython uses singletons for numbers in [-5; 256] range:: >>> x=1; (x + 1) is 2 True >>> x=1000; (x + 1) is 1001 False Python 3.8 compiler now emits a ``SyntaxWarning`` when the right operand of the ``is`` and ``is not`` operators is a literal (ex: integer or string), but don't warn if it is ``None``, ``True``, ``False`` or ``Ellipsis`` singleton (`bpo-34850 <https://bugs.python.org/issue34850>`_). Example:: >>> x=1 >>> x is 1 <stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="? True CPython PyObject_RichCompareBool -------------------------------- CPython considers that two objects are identical if their memory address are equal: ``x is y`` in Python (``IS_OP`` opcode) is implemented internally in C as ``left == right`` where ``left`` and ``right`` are ``PyObject *`` pointers. The main function to implement comparison in CPython is ``PyObject_RichCompareBool()``. This function considers that two objects are equal if the two ``PyObject*`` pointers are equal (if the two objects are "identical"). For example, ``PyObject_RichCompareBool(obj1, obj2, Py_EQ)`` doesn't call ``obj1.__eq__(obj2)`` if ``obj1 == obj2`` where ``obj1`` and ``obj2`` are ``PyObject*`` pointers. This behavior is an optimization to make Python more efficient. For example, the ``dict`` lookup avoids ``__eq__()`` if two pointers are equal. Another example are Not-a-Number (NaN) floating pointer numbers which are not equal to themselves:: >>> nan = float("nan") >>> nan is nan True >>> nan == nan False The ``list.__contains__(obj)`` and ``list.index(obj)`` methods are implemented with ``PyObject_RichCompareBool()`` and so rely on objects identity:: >>> lst = [9, 7, nan] >>> nan in lst True >>> lst.index(nan) 2 >>> lst[2] == nan False In CPython, ``x == y`` is implemented with ``PyObject_RichCompare()`` which don't make the assumption that identical objects are equal. That's why ``nan == nan`` or ``lst[2] == nan`` return ``False``. Issues for other Python implementations --------------------------------------- The Python language doesn't require to be implemented with ``PyObject`` structure and use ``PyObject*`` pointers. PyPy doesn't use ``PyObject`` nor ``PyObject*``. If CPython is modified to use `Tagged Pointers`_, CPython would have the same issue. Alternative Python implementations have to mimick CPython to reduce incompatibilities. For example, PyPy mimicks CPython behavior for the ``is`` operator with CPython small integer singletons:: >>>> x=1; (x + 1) is 2 True It also mimicks CPython ``PyObject_RichCompareBool()``. Example with the Not-a-Number (NaN) float:: >>>> nan=float("nan") >>>> nan == nan False >>>> lst = [9, 7, nan] >>>> nan in lst True >>>> lst.index(nan) 2 >>>> lst[2] == nan False Better advertise alternative Python runtimes ============================================ Currently, PyPy and other "alternative" Python runtimes are not well advertised on the `Python website <https://www.python.org/>`_. They are only listed as the last choice in the Download menu. Once enough C extensions will be compatible with the limited C API, PyPy and other Python runtimes should be better advertised on the Python website and in the Python documentation, to no longer introduce them as as first-class citizen. Obviously, CPython is likely to remain the most feature-complete implementation in mid-term, since new PEPs are first implemented in CPython. Limitations can be simply documented, and users should be free to make their own choice, depending on their use cases. HPy project =========== The `HPy project <https://github.com/pyhandle/hpy>`__ is a brand new C API written from scratch. It is designed to ease migration from the current C API and to be efficient on PyPy. HPy hides all implementation details: it is based on "handles" so objects cannot be inspected with direct memory access: only opaque function calls are allowed. This abstraction has many benefits: * No more ``PyObject`` emulation needed: smaller memory footprint in PyPy cpyext, no more expensive conversions. * It is possible to have multiple handles pointing to the same object. It helps to better track the object lifetime and makes the PyPy implementation easier. PyPy doesn't use reference counting but a tracing garbage collector. When the PyPy GC moves objects in memory, handles don't change! HPy uses an array mapping handle to objects: only this array has to be updated. It is way more efficient. * The Python runtime is free to modify deep internals compared to CPython. Many optimizations become possible: see `Optimize CPython`_ section. * It is easy to add a debug wrapper to add checks before and after the function calls. For example, ensure that that GIL is held when calling CPython. HPy is developed outside CPython, is implemented on top of the existing Python C API, and so can support old Python versions. By default, binaries compiled in "universal" HPy ABI mode can be used on CPython and PyPy. HPy can also target CPython ABI which has the same performance than native C extensions. See HPy documentation of `Target ABIs documentation <https://github.com/pyhandle/hpy/blob/feature/improve-docs/docs/overview.rst#target-abis>`_. The PEP moves the C API towards HPy design and API. New optimized CPython runtime ============================== Backward incompatible changes is such a pain for the whole Python community. To ease the migration (accelerate adoption of the new C API), one option is to provide not only one but two CPython runtimes: * Regular CPython: fully backward compatible, support direct access to structures like ``PyObject``, etc. * New optimized CPython: incompatible, cannot import C extensions which don't use the limited C API, has new optimizations, limited to the C API. Technically, both runtimes would have the same code base, to ease maintenance: CPython. The new optimized CPython would be a ./configure flag to build a different Python. On Windows, it would be a different project of the Visual Studio solution reusing pythoncore project, but define a macro to build enable optimization and change the C API. The new optimized CPython runtime remains compatible with CPython 3.8 `stable ABI`_. CPython code base remains 30 years old. Many technical choices made 30 years ago are no longer relevant today. This PEP should ease the development of new Python implementation which would be even more efficient, like PyPy! Cython and cffi =============== Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython. Cython may be modified to add a new build mode where only the "limited C API" is used. Use Cases ========= Optimize CPython ---------------- The new optimized runtime can implement new optimizations since it only supports C extension modules which don't access Python internals. Tagged pointers ............... `Tagged pointer <https://en.wikipedia.org/wiki/Tagged_pointer>`_. Avoid ``PyObject`` for small objects (ex: small integers, short Latin-1 strings, None and True/False singletons): store the content directly in the pointer, with a tag for the object type. Tracing garbage collector ......................... Experiment with a tracing garbage collector inside CPython. Keep reference counting for the C API. Rewriting CPython with a tracing garbage collector is large project which is out of the scope of this PEP. This PEP fix some blockers issues which prevent to start such project today. One of the issue are functions of the C API which return a pointer like ``PyBytes_AsString()``. Python doesn't know when the caller stops using the pointer, and so cannot move the object in memory (for a moving garbage collector). API like ``PyBuffer`` is better since it requires the caller to call ``PyBuffer_Release()`` when it is done. Specialized list ................ Specialize lists of small integers: if a list only contains numbers which fit into a C ``int32_t``, a Python list object could use a more efficient ``int32_t`` array to reduce the memory footprint (avoid ``PyObject`` overhead for these numbers). Temporary ``PyObject`` objects would be created on demand for backward compatibility. This optimization is less interesting if tagged pointers are implemented. PyPy already implements this optimization. O(1) bytearray to bytes conversion .................................. Convert bytearray to bytes without memory copy. Currently, bytearray is used to build a bytes string, but it's usually converted into a bytes object to respect an API. This conversion requires to allocate a new memory block and copy data (O(n) complexity). It is possible to implement O(1) conversion if it would be possible to pass the ownership of the bytearray object to bytes. That requires modifying the ``PyBytesObject`` structure to support multiple storages (support storing content into a separate memory block). Fork and "Copy-on-Read" problem ............................... Solve the "Copy on read" problem with fork: store reference counter outside ``PyObject``. Currently, when a Python object is accessed, its ``ob_refcnt`` member is incremented temporarily to hold a "strong reference" to it (ensure that it cannot be destroyed while we use it). Many operating system implement fork() using copy-on-write ("CoW"). A memory page (ex: 4 KB) is only copied when a process (parent or child) modifies it. After Python is forked, modifying ``ob_refcnt`` copies the memory page, even if the object is only accessed in "read only mode". `Dismissing Python Garbage Collection at Instagram <https://engineering.instagram.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172>`_ (Jan 2017) by Instagram Engineering. Instagram contributed `gc.freeze() <https://docs.python.org/dev/library/gc.html#gc.freeze>`_ to Python 3.7 which works around the issue. One solution for that would be to store reference counters outside ``PyObject``. For example, in a separated hash table (pointer to reference counter). Changing ``PyObject`` structures requires that C extensions don't access them directly. Debug runtime and remove debug checks in release mode ..................................................... If the C extensions are no longer tied to CPython internals, it becomes possible to switch to a Python runtime built in debug mode to enable runtime debug checks to ease debugging C extensions. If using such a debug runtime becomes harder, indirectly it means that runtime debug checks can be removed from the release build. CPython code base is still full of runtime checks calling ``PyErr_BadInternalCall()`` on failure. Removing such checks in release mode can make Python more efficient. PyPy ---- ujson is 3x faster on PyPy when using HPy instead of the Python C API. See `HPy kick-off sprint report <https://morepypy.blogspot.com/2019/12/hpy-kick-off-sprint-report.html>`_ (December 2019). This PEP should help to make PyPy cpyext more efficient, or at least ease the migration of C extensions to HPy. GraalPython ----------- `GraalPython <https://github.com/graalvm/graalpython>`_ is a Python 3 implementation built on `GraalVM <https://www.graalvm.org/>`_ ("Universal VM for a polyglot world"). It is interested in supporting HPy. See `Leysin 2020 Sprint Report <https://morepypy.blogspot.com/2020/03/leysin-2020-sprint-report.html>`_. It would also benefit of this PEP. RustPython, Rust-CPython and PyO3 --------------------------------- Rust-CPython is interested in supporting HPy. See `Leysin 2020 Sprint Report <https://morepypy.blogspot.com/2020/03/leysin-2020-sprint-report.html>`_. RustPython and PyO3 would also benefit of this PEP. Links: * `PyO3 <https://github.com/PyO3/pyo3>`_: Rust bindings for the Python (CPython) interpreter * `rust-cpython <https://github.com/dgrunwald/rust-cpython>`_: Rust <-> Python (CPython) bindings * `RustPython <https://github.com/RustPython/RustPython>`_: A Python Interpreter written in Rust Rejected Ideas ============== Drop the C API -------------- One proposed alternative to a new better C API is to drop the C API at all. The reasoning is that since existing solutions are already available, complete and reliable, like Cython and cffi. What about the long tail of C extensions on PyPI which still use the C API? Would a Python without these C extensions would remain relevant? Lots of project do not use those solution, and the C API is part of Python success. For example, there would be no numpy without the C API. It doesn't sound like a workable solution. Bet on HPy, leave the C API unchanged ------------------------------------- The HPy project is developed outside CPython and so doesn't cause any backward incompatibility in CPython. HPy API was designed with efficiency in mind. The problem is the long tail of C extensions on PyPI which are written with the C API and will not be converted soon or will never be converted to HPy. The transition from Python 2 to Python 3 showed that migrations are very slow and never fully complete. The PEP also rely on the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible with the new opaque C API. The concept of HPy is not new: CPython has a limited C API which provides a stable ABI since Python 3.4, see `PEP 384: Defining a Stable ABI <https://www.python.org/dev/peps/pep-0384/>`_. Since it is an opt-in option, most users simply use the **default** C API. Prior Art ========= * `pythoncapi.readthedocs.io <https://pythoncapi.readthedocs.io/>`_: Research project behind this PEP * July 2019: Keynote `Python Performance: Past, Present, Future <https://github.com/vstinner/talks/raw/master/2019-EuroPython/python_performance.pdf>`_ (slides) by Victor Stinner at EuroPython 2019 * [python-dev] `Make the stable API-ABI usable <https://mail.python.org/pipermail/python-dev/2017-November/150607.html>`_ (November 2017) by Victor Stinner * [python-ideas] `PEP: Hide implementation details in the C API <https://mail.python.org/pipermail/python-ideas/2017-July/046399.html>`_ (July 2017) by Victor Stinner. Old PEP draft which proposed to add an option to build C extensions. * `A New C API for CPython <https://vstinner.github.io/new-python-c-api.html>`_ (Sept 2017) article by Victor Stinner * `Python Performance <https://github.com/vstinner/conf/raw/master/2017-PyconUS/summit.pdf>`_ (May 2017 at the Language Summit) by Victor Stinner: early discusssions on reorganizing header files, promoting PyPy, fix the C API, etc. Discussion summarized in `Keeping Python competitive <https://lwn.net/Articles/723949/>`_ article. Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. -- Night gathers, and now my watch begins. It shall not end until my death.
On Fri, 10 Apr 2020 19:20:00 +0200 Victor Stinner <vstinner@python.org> wrote:
Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython.
Using Cython does not make the C API irrelevant. In some applications, the C API has to be low-level enough for performance. Whether the application is written in Cython or not.
**Status:** not started. The performance overhead must be measured with benchmarks and this PEP should be accepted.
Surely you mean "before this PEP should be accepted"?
Examples of issues to make structures opaque:
* ``PyGC_Head``: https://bugs.python.org/issue40241 * ``PyObject``: https://bugs.python.org/issue39573 * ``PyTypeObject``: https://bugs.python.org/issue40170
How do you keep fast type checking such as PyTuple_Check() if extension code doesn't have access e.g. to tp_flags? I notice you did: """ Add fast inlined version _PyType_HasFeature() and _PyType_IS_GC() for object.c and typeobject.c. """ So you understand there is a need.
**Backward compatibility:** backward incompatible on purpose. Break the limited C API and the stable ABI, with the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible.
The problem here is not only compatibility but potential performance regressions in C extensions.
New optimized CPython runtime ==============================
Backward incompatible changes is such a pain for the whole Python community. To ease the migration (accelerate adoption of the new C API), one option is to provide not only one but two CPython runtimes:
* Regular CPython: fully backward compatible, support direct access to structures like ``PyObject``, etc. * New optimized CPython: incompatible, cannot import C extensions which don't use the limited C API, has new optimizations, limited to the C API.
Well, this sounds like a distribution nightmare. Some packages will only be available for one runtime and not the other. It will confuse non-expert users.
O(1) bytearray to bytes conversion ..................................
Convert bytearray to bytes without memory copy.
Currently, bytearray is used to build a bytes string, but it's usually converted into a bytes object to respect an API. This conversion requires to allocate a new memory block and copy data (O(n) complexity).
It is possible to implement O(1) conversion if it would be possible to pass the ownership of the bytearray object to bytes.
That requires modifying the ``PyBytesObject`` structure to support multiple storages (support storing content into a separate memory block).
If that's desirable (I'm not sure it is), there is a simpler solution: instead of allocating a raw memory area, bytearray could allocate... a private bytes object that you can detach without copying it. But really, this is why we have BytesIO. Which already uses that exact strategy: allocate a private bytes object.
Fork and "Copy-on-Read" problem ...............................
Solve the "Copy on read" problem with fork: store reference counter outside ``PyObject``.
Nowadays it is strongly recommended to use multiprocessing with the "forkserver" start method: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-me... With "forkserver", the forked process is extremely lightweight and there are little savings to be made in the child.
`Dismissing Python Garbage Collection at Instagram <https://engineering.instagram.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172>`_ (Jan 2017) by Instagram Engineering.
Instagram contributed `gc.freeze() <https://docs.python.org/dev/library/gc.html#gc.freeze>`_ to Python 3.7 which works around the issue.
One solution for that would be to store reference counters outside ``PyObject``. For example, in a separated hash table (pointer to reference counter). Changing ``PyObject`` structures requires that C extensions don't access them directly.
You're planning to introduce a large overhead for each reference count lookup just to satisfy a rather niche use case? CPython probably does millions of reference counts per second.
Debug runtime and remove debug checks in release mode .....................................................
If the C extensions are no longer tied to CPython internals, it becomes possible to switch to a Python runtime built in debug mode to enable runtime debug checks to ease debugging C extensions.
That's the one convincing feature in this PEP, as far as I'm concerned. Regards Antoine.
On 10Apr2020 2055, Antoine Pitrou wrote:
On Fri, 10 Apr 2020 19:20:00 +0200 Victor Stinner <vstinner@python.org> wrote:
Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython.
Using Cython does not make the C API irrelevant. In some applications, the C API has to be low-level enough for performance. Whether the application is written in Cython or not.
It does to the code author. The point here is that we want authors who insist on coding against the C API to be aware that they have fewer compatibility guarantees - maybe even to the point of needing to rebuild for each minor version if you want to insist on using macros (i.e. anything starting with "_Py").
Examples of issues to make structures opaque:
* ``PyGC_Head``: https://bugs.python.org/issue40241 * ``PyObject``: https://bugs.python.org/issue39573 * ``PyTypeObject``: https://bugs.python.org/issue40170
How do you keep fast type checking such as PyTuple_Check() if extension code doesn't have access e.g. to tp_flags?
Measured in isolation, sure. But what task are you doing that is being held up by builtin type checks? If the type check is the bottleneck, you need to work on more interesting algorithms ;)
I notice you did: """ Add fast inlined version _PyType_HasFeature() and _PyType_IS_GC() for object.c and typeobject.c. """
So you understand there is a need.
These are private APIs.
**Backward compatibility:** backward incompatible on purpose. Break the limited C API and the stable ABI, with the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible.
The problem here is not only compatibility but potential performance regressions in C extensions.
I don't think we've ever guaranteed performance between releases. Correctness, sure, but not performance.
New optimized CPython runtime ==============================
Backward incompatible changes is such a pain for the whole Python community. To ease the migration (accelerate adoption of the new C API), one option is to provide not only one but two CPython runtimes:
* Regular CPython: fully backward compatible, support direct access to structures like ``PyObject``, etc. * New optimized CPython: incompatible, cannot import C extensions which don't use the limited C API, has new optimizations, limited to the C API.
Well, this sounds like a distribution nightmare. Some packages will only be available for one runtime and not the other. It will confuse non-expert users.
Agreed (except that it will also confuse expert users). Doing "Python 4"-by-stealth like this is a terrible idea. If it's incompatible, give it a new version number. If you don't want a new version number, maintain compatibility. There are no alternatives.
O(1) bytearray to bytes conversion ..................................
Convert bytearray to bytes without memory copy.
Currently, bytearray is used to build a bytes string, but it's usually converted into a bytes object to respect an API. This conversion requires to allocate a new memory block and copy data (O(n) complexity).
It is possible to implement O(1) conversion if it would be possible to pass the ownership of the bytearray object to bytes.
That requires modifying the ``PyBytesObject`` structure to support multiple storages (support storing content into a separate memory block).
If that's desirable (I'm not sure it is), there is a simpler solution: instead of allocating a raw memory area, bytearray could allocate... a private bytes object that you can detach without copying it.
Yeah, I don't see the point in this one, unless you mean a purely internal change. Is this a major bottleneck? Having a broader concept of "freezable" objects may be a valuable thing to enable in a new runtime, but retrofitting it to CPython doesn't seem likely to have a big impact.
Fork and "Copy-on-Read" problem ...............................
Solve the "Copy on read" problem with fork: store reference counter outside ``PyObject``.
Nowadays it is strongly recommended to use multiprocessing with the "forkserver" start method: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-me...
With "forkserver", the forked process is extremely lightweight and there are little savings to be made in the child.
Unfortunately, a recommendation that only applies to a minority of Python users. Oh well. Separating refcounts theoretically improves cache locality, specifically the case where cache invalidation impacts multiple CPUs (and even the case where a single thread moves between CPUs). But I don't think there's been a convincing real benchmark of this yet.
Debug runtime and remove debug checks in release mode .....................................................
If the C extensions are no longer tied to CPython internals, it becomes possible to switch to a Python runtime built in debug mode to enable runtime debug checks to ease debugging C extensions.
That's the one convincing feature in this PEP, as far as I'm concerned.
Eh, this assumes that someone is fully capable of rebuilding CPython and their own extension, but not one of their dependencies, and this code that they're using doesn't have any system dependencies that differ in debug builds (spoiler: they do). That seems like an oddly specific scenario that we don't really have to support. I'd love to hear what Victor's working on that makes him so keen on this :) All up, moving the C API away from macros and direct structure member access to *real* (not "static inline") functions is a very good thing for compatibility. Personally I'd like to go to a function table model to support future extensibility and back-compat shims, but first we have to deal with the macros. Cheers, Steve
On Fri, 10 Apr 2020 23:33:28 +0100 Steve Dower <steve.dower@python.org> wrote:
On 10Apr2020 2055, Antoine Pitrou wrote:
On Fri, 10 Apr 2020 19:20:00 +0200 Victor Stinner <vstinner@python.org> wrote:
Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython.
Using Cython does not make the C API irrelevant. In some applications, the C API has to be low-level enough for performance. Whether the application is written in Cython or not.
It does to the code author.
The point here is that we want authors who insist on coding against the C API to be aware that they have fewer compatibility guarantees [...]
Yeah, you missed the point of my comment here. Cython *does* call into the C API, and it's quite insistent on performance optimizations too. Saying "just use Cython" doesn't make the C API unimportant - it just hides it from your own sight.
- maybe even to the point of needing to rebuild for each minor version if you want to insist on using macros (i.e. anything starting with "_Py").
If there's still a way for C extensions to get at now-private APIs, then the PEP fails to convey that, IMHO.
**Backward compatibility:** backward incompatible on purpose. Break the limited C API and the stable ABI, with the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible.
The problem here is not only compatibility but potential performance regressions in C extensions.
I don't think we've ever guaranteed performance between releases. Correctness, sure, but not performance.
That's a rather weird argument. Just because you don't guarantee performance doesn't mean it's ok to introduce performance regressions. It's especially a weird argument to make when discussing a PEP where most of the arguments are distant promises of improved performance.
Fork and "Copy-on-Read" problem ...............................
Solve the "Copy on read" problem with fork: store reference counter outside ``PyObject``.
Nowadays it is strongly recommended to use multiprocessing with the "forkserver" start method: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-me...
With "forkserver", the forked process is extremely lightweight and there are little savings to be made in the child.
Unfortunately, a recommendation that only applies to a minority of Python users. Oh well.
Which "minority" are you talking about? Neither of us has numbers, but I'm quite sure that the population of Python users calling into multiprocessing (or a third-party library relying on multiprocessing, such as Dask) is much larger than the population of Python users calling fork() directly and relying on copy-on-write for optimization purposes. But if you have a different experience to share, please do so.
Separating refcounts theoretically improves cache locality, specifically the case where cache invalidation impacts multiple CPUs (and even the case where a single thread moves between CPUs).
I'm a bit curious why it would improve, rather than degrade, cache locality. If you take the typical example of the eval loop, an object is incref'ed and decref'ed just about the same time that it gets used. I'll also note that the PEP proposes to remove APIs which return borrowed references... yet increasing the number of cases where accessing an object implies updating its refcount. Therefore I'm unconvinced that stashing refcounts in a separate memory area would provide any CPU efficiency benefit.
Debug runtime and remove debug checks in release mode .....................................................
If the C extensions are no longer tied to CPython internals, it becomes possible to switch to a Python runtime built in debug mode to enable runtime debug checks to ease debugging C extensions.
That's the one convincing feature in this PEP, as far as I'm concerned.
Eh, this assumes that someone is fully capable of rebuilding CPython and their own extension, but not one of their dependencies, [...]
You don't need to rebuild CPython if someone provides a binary debug build (which would probably happen if such a build were compatible with regular packages). You also don't need to rebuild your own extension to take advantage of the interpreter's internal correctness checks, if the interpreter's ABI hasn't changed. This is the whole point: being able to load an unmodified extension (and unmodified dependencies) on a debug-checks-enabled interpreter.
and this code that they're using doesn't have any system dependencies that differ in debug builds (spoiler: they do).
Are you talking about Windows? On non-Windows systems, I don't think there are "system dependencies that differ in debug builds". Regards Antoine.
On 11Apr2020 0025, Antoine Pitrou wrote:
On Fri, 10 Apr 2020 23:33:28 +0100 Steve Dower <steve.dower@python.org> wrote:
On 10Apr2020 2055, Antoine Pitrou wrote:
On Fri, 10 Apr 2020 19:20:00 +0200 Victor Stinner <vstinner@python.org> wrote:
Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython.
Using Cython does not make the C API irrelevant. In some applications, the C API has to be low-level enough for performance. Whether the application is written in Cython or not.
It does to the code author.
The point here is that we want authors who insist on coding against the C API to be aware that they have fewer compatibility guarantees [...]
Yeah, you missed the point of my comment here. Cython *does* call into the C API, and it's quite insistent on performance optimizations too. Saying "just use Cython" doesn't make the C API unimportant - it just hides it from your own sight.
It centralises the change. I have no problem giving Cython access to things that we discourage every developer from using, provided they remain responsive to change and use the special access responsibly (e.g. by not touching reserved fields at all). We could do a better job of helping them here.
**Backward compatibility:** backward incompatible on purpose. Break the limited C API and the stable ABI, with the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible.
The problem here is not only compatibility but potential performance regressions in C extensions.
I don't think we've ever guaranteed performance between releases. Correctness, sure, but not performance.
That's a rather weird argument. Just because you don't guarantee performance doesn't mean it's ok to introduce performance regressions.
It's especially a weird argument to make when discussing a PEP where most of the arguments are distant promises of improved performance.
If you've guaranteed compatibility but not performance, it means you can make changes that prioritise compatibility over performance. If you promise to keep everything the same, you can never change anything. Arguing that everything is an implied contract between major version releases is the weird argument.
Fork and "Copy-on-Read" problem ...............................
Solve the "Copy on read" problem with fork: store reference counter outside ``PyObject``.
Nowadays it is strongly recommended to use multiprocessing with the "forkserver" start method: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-me...
With "forkserver", the forked process is extremely lightweight and there are little savings to be made in the child.
Unfortunately, a recommendation that only applies to a minority of Python users. Oh well.
Which "minority" are you talking about? Neither of us has numbers, but I'm quite sure that the population of Python users calling into multiprocessing (or a third-party library relying on multiprocessing, such as Dask) is much larger than the population of Python users calling fork() directly and relying on copy-on-write for optimization purposes.
But if you have a different experience to share, please do so.
Neither Windows not macOS support fork (macOS only recently). Break that down however you like, but by number of *developers* (as opposed to number of machines), and factoring in those who care about cross-platform compatibility, fork is not a viable thing to rely on.
Separating refcounts theoretically improves cache locality, specifically the case where cache invalidation impacts multiple CPUs (and even the case where a single thread moves between CPUs).
I'm a bit curious why it would improve, rather than degrade, cache locality. If you take the typical example of the eval loop, an object is incref'ed and decref'ed just about the same time that it gets used.
Two CPUs can read the contents of a string from their own cache. As soon as one touches the refcount, the cache line containing both the refcount and the string data in the other CPU is invalidated, and now it has to wait for synchronisation before reading the data. If the refcounts are in a separate cache line, this synchronization doesn't have to happen.
I'll also note that the PEP proposes to remove APIs which return borrowed references... yet increasing the number of cases where accessing an object implies updating its refcount.
Yeah, I'm more okay with keeping borrowed references in some cases, but it does make things more complicated. Apparently some developers get it wrong consistently enough that we have to fix it? (ALL developers get it wrong during development ;) )
and this code that they're using doesn't have any system dependencies that differ in debug builds (spoiler: they do).
Are you talking about Windows? On non-Windows systems, I don't think there are "system dependencies that differ in debug builds".
Of course I'm talking about Windows. I'm about the only person here who does, and I'm having to represent at least half of our overall userbase (look up my PyCon 2019 talk for the charts). Just because I'm the minority in this group doesn't mean I'm holding a minority opinion. Cheers, Steve
On Mon, 13 Apr 2020 11:35:34 +0100 Steve Dower <steve.dower@python.org> wrote:
Neither Windows not macOS support fork (macOS only recently).
Victor's argument: "fork() is not terrific with inline reference counts". My argument: people shouldn't generally use fork() anyway, because it has other issues. My statement that people should prefer "forkserver" was in that context (if you are trying to build parallel applications using fork() calls, think twice). Obviously on Windows you'll use the "spawn" method, because it's the only available one ;-) And on macOS, you'll probably do whatever the latest recommended thing to do is ("spawn", I suppose).
Separating refcounts theoretically improves cache locality, specifically the case where cache invalidation impacts multiple CPUs (and even the case where a single thread moves between CPUs).
I'm a bit curious why it would improve, rather than degrade, cache locality. If you take the typical example of the eval loop, an object is incref'ed and decref'ed just about the same time that it gets used.
Two CPUs can read the contents of a string from their own cache. As soon as one touches the refcount, the cache line containing both the refcount and the string data in the other CPU is invalidated, and now it has to wait for synchronisation before reading the data.
Ah, you're right. However, the GIL should make such events less frequent than in a language like C++. Compared to the overhead of look up reference counts in a different memory area (probably using a non-trivial algorithm to determine the exact address), I'm not sure which factor would dominate.
and this code that they're using doesn't have any system dependencies that differ in debug builds (spoiler: they do).
Are you talking about Windows? On non-Windows systems, I don't think there are "system dependencies that differ in debug builds".
Of course I'm talking about Windows. I'm about the only person here who does, and I'm having to represent at least half of our overall userbase (look up my PyCon 2019 talk for the charts).
Ok :-) However, Victor's point holds for non-Windows platforms, which is *also* half of our userbase. Regards Antoine.
On 13Apr2020 1157, Antoine Pitrou wrote:
On Mon, 13 Apr 2020 11:35:34 +0100 Steve Dower <steve.dower@python.org> wrote:
and this code that they're using doesn't have any system dependencies that differ in debug builds (spoiler: they do).
Are you talking about Windows? On non-Windows systems, I don't think there are "system dependencies that differ in debug builds".
Of course I'm talking about Windows. I'm about the only person here who does, and I'm having to represent at least half of our overall userbase (look up my PyCon 2019 talk for the charts).
Ok :-) However, Victor's point holds for non-Windows platforms, which is *also* half of our userbase.
True, though probably not the half sending him binary extension modules that nobody can rebuild ;) Cheers, Steve
Steve Dower wrote:
On 11Apr2020 0025, Antoine Pitrou wrote:
On Fri, 10 Apr 2020 23:33:28 +0100
Steve Dower <steve.dower@python.org> wrote:
On 10Apr2020 2055, Antoine Pitrou wrote:
On Fri, 10 Apr 2020 19:20:00 +0200
Victor Stinner <vstinner@python.org> wrote:
Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython.
Using Cython does not make the C API irrelevant. In some applications, the C API has to be low-level enough for performance. Whether the application is written in Cython or not.
It does to the code author.
The point here is that we want authors who insist on coding against the C API to be aware that they have fewer compatibility guarantees [...]
Yeah, you missed the point of my comment here. Cython *does* call into the C API, and it's quite insistent on performance optimizations too. Saying "just use Cython" doesn't make the C API unimportant - it just hides it from your own sight.
It centralises the change. I have no problem giving Cython access to things that we discourage every developer from using, provided they remain responsive to change and use the special access responsibly (e.g. by not touching reserved fields at all).
It appears to me that this whole line of argument is contradicting the purpose of the whole idea. What am I missing? For one thing, if you open up APIs for Cython, they're open for everybody (Cython being "just" another C extension). More to the point: The ABIs have the same problem as they have now, regardless how responsive the Cython developers are. Once you compiled the extension, you're using the ABI and are supposedly not required to recompile to stay compatible. So, where I'm getting at is: Either you open up to everybody or nobody. In C there's not really an in-between. Cheers, nd
On 13Apr2020 2308, André Malo wrote:
For one thing, if you open up APIs for Cython, they're open for everybody (Cython being "just" another C extension). More to the point: The ABIs have the same problem as they have now, regardless how responsive the Cython developers are. Once you compiled the extension, you're using the ABI and are supposedly not required to recompile to stay compatible.
So, where I'm getting at is: Either you open up to everybody or nobody. In C there's not really an in-between.
On a technical level, you are correct. On a policy level, we don't make changes that would break users of the C API. Because we can't track everyone who's using it, we have to assume that everything is used and any change will cause breakage. To make sure it's possible to keep developing CPython, we declare parts of the API off limits (typically by prepending them with an underscore). If you use these, and you break, we're sorry but we aren't going to fix it. This line of discussion is basically saying that we would designate a broader section of the API that is off limits, most likely the parts that are only useful for increased performance (rather than increased functionality). We would then specifically include the Cython team/volunteers in discussions about how to manage changes to these parts of the API to avoid breaking them, and possibly do simultaneous releases to account for changes so that their users have more time to rebuild. Effectively, when we change our APIs, we would break everyone except Cython because we've worked with them to avoid the breakage. Anyone else using it has to make their own effort to follow CPython development and detect any breakage themselves (just like today). So probably the part you're missing is where we would give ourselves permission to break more APIs in a release, while simultaneously encouraging people to use Cython as an isolation layer from those breaks. (Cython is still just a placeholder name here, btw. There are 1-2 other projects that could be considered instead, though I think Cython is the only one that also provides a usability improvement as well as API stability.) Cheers, Steve
Steve Dower wrote:
On a policy level, we don't make changes that would break users of the C API. Because we can't track everyone who's using it, we have to assume that everything is used and any change will cause breakage.
To make sure it's possible to keep developing CPython, we declare parts of the API off limits (typically by prepending them with an underscore). If you use these, and you break, we're sorry but we aren't going to fix it.
This line of discussion is basically saying that we would designate a broader section of the API that is off limits, most likely the parts that are only useful for increased performance (rather than increased functionality). We would then specifically include the Cython team/volunteers in discussions about how to manage changes to these parts of the API to avoid breaking them, and possibly do simultaneous releases to account for changes so that their users have more time to rebuild.
Effectively, when we change our APIs, we would break everyone except Cython because we've worked with them to avoid the breakage. Anyone else using it has to make their own effort to follow CPython development and detect any breakage themselves (just like today).
So probably the part you're missing is where we would give ourselves permission to break more APIs in a release, while simultaneously encouraging people to use Cython as an isolation layer from those breaks.
The encouraging part is not working for me :-) And seriously, my gut tells me, we're split at 50/50 here. People usually write C for a reason and Cython is not. For, let's say, half of the cases that's fine, speeding up inner loops and all that, which not touching the C level at all. The other half wants to solve different issues. I think, it does not serve well as a policy for CPython. Since we're talking hypotheticals right now, if Cython vanishes tomorrow, we're kind of left empty handed. Such kind of a runtime, if considered part of the compatibility "promise", should be provided by the core itself, no? A good way to test that promise (or other implications like performance) might also be to rewrite the standard library extensions in Cython and see where it leads. I personally see myself using the python-provided runtime (types, methods, GC), out of convenience (it's there, so why not use it). The vision of the future outlined here can easily lead to backing off from that and rebuilding all those things and really only keep touchpoints with python when it comes to interfacing with python itself. It's probably even desirable that way. But definitely more work (for an extension author). As a closing word, I don't mind either way. IOW I'm not complaining. I'm just putting more opinion from the "outside" into the ring. Thanks for listening :-) Cheers, nd
André Malo schrieb am 14.04.20 um 13:39:
I think, it does not serve well as a policy for CPython. Since we're talking hypotheticals right now, if Cython vanishes tomorrow, we're kind of left empty handed. Such kind of a runtime, if considered part of the compatibility "promise", should be provided by the core itself, no?
There was some discussion a while ago about integrating a stripped-down variant of Cython into CPython's stdlib. I was arguing against that because the selling point of Cython is really what it is, and stripping that down wouldn't lead to something equally helpful for users. I think it's good to have separate projects (and, in fact, it's more than one) deal with this need. In the end, it's an external tool, like your editor, your C compiler, your debugger and whatever else you need for developing Python extensions. It spits out C code and lets you do with it what you want. There's no reason it should be part of the CPython project, core or stdlib. It's even written in Python. If it doesn't work for you, you can fix it.
A good way to test that promise (or other implications like performance) might also be to rewrite the standard library extensions in Cython and see where it leads.
Not sure I understand what you're saying here. stdlib extension modules are currently written in C, with a bit of code generation. How is that different?
I personally see myself using the python-provided runtime (types, methods, GC), out of convenience (it's there, so why not use it). The vision of the future outlined here can easily lead to backing off from that and rebuilding all those things and really only keep touchpoints with python when it comes to interfacing with python itself. It's probably even desirable that way
That's actually not an uncommon thing to do. Some packages really only use Cython or pybind11 to wrap their otherwise native C or C++ code. It's a choice given specific organisational/project/developer constraints, and choices are good. Stefan
Stefan Behnel wrote:
André Malo schrieb am 14.04.20 um 13:39:
I think, it does not serve well as a policy for CPython. Since we're talking hypotheticals right now, if Cython vanishes tomorrow, we're kind of left empty handed. Such kind of a runtime, if considered part of the compatibility "promise", should be provided by the core itself, no?
There was some discussion a while ago about integrating a stripped-down variant of Cython into CPython's stdlib. I was arguing against that because the selling point of Cython is really what it is, and stripping that down wouldn't lead to something equally helpful for users.
I think it's good to have separate projects (and, in fact, it's more than one) deal with this need.
In the end, it's an external tool, [...]
Thank you, that is my point exactly. It's the same "external" as everything else. I'm still trying to understand where to separate the different sets of "external".
A good way to test that promise (or other implications like performance) might
also be to rewrite the standard library extensions in Cython and
see where it leads.
Not sure I understand what you're saying here. stdlib extension modules are currently written in C, with a bit of code generation. How is that different?
They are C extensions like the ones everybody could write. They should use the same APIs. What I'm saying is, that it would be a good test if the APIs are good enough (for everybody else). If, say, Cython is recommended, some attempt should be made to achieve the same results with Cython. Or some other sets of APIs which are considered for "the public". I don't think, the current stdlib modules restrict themselves to a limited API. The distinction between "inside" and "outside" bothers me.
I personally see myself using the python-provided runtime (types, methods,
GC), out of convenience (it's there, so why not use it). The vision of
the future outlined here can easily lead to backing off from that and rebuilding all those things and really only keep touchpoints with python when it comes to interfacing with python itself. It's probably even desirable that way
That's actually not an uncommon thing to do. Some packages really only use Cython or pybind11 to wrap their otherwise native C or C++ code. It's a choice given specific organisational/project/developer constraints, and choices are good.
Agreed. Nevertheless, the choices are going to be limited by extra constraints. Cheers, nd
On 14Apr2020 1557, André Malo wrote:
Stefan Behnel wrote:
André Malo schrieb am 14.04.20 um 13:39:
A good way to test that promise (or other implications like performance) might also be to rewrite the standard library extensions in Cython and see where it leads.
Not sure I understand what you're saying here. stdlib extension modules are currently written in C, with a bit of code generation. How is that different?
They are C extensions like the ones everybody could write. They should use the same APIs. What I'm saying is, that it would be a good test if the APIs are good enough (for everybody else). If, say, Cython is recommended, some attempt should be made to achieve the same results with Cython. Or some other sets of APIs which are considered for "the public".
I don't think, the current stdlib modules restrict themselves to a limited API. The distinction between "inside" and "outside" bothers me.
It should not bother you. The standard library is not a testing ground for the public API - it's a layer to make those APIs available to users in a reliable, compatible format. Think of it like your C runtime, which uses a lot of system calls that have changed far more often than libc. We can change the interface between the runtime and the included modules as frequently as we like, because it's private. And we do change them, and the changes go unnoticed because we adapt both sides of the contract at once. For example, we recently changed the calling conventions for certain functions, which didn't break anyone because we updated the callers as well. And we completely reimplemented stat() emulation on Windows recently, which wasn't incompatible because the public part of the API didn't change (except to have fewer false errors). Modules that are part of the core runtime deliberately use private APIs so that other extension modules don't have to. It's not any sort of unfair advantage - it's a deliberate aspect of the software's design. Cheers, Steve
Steve Dower wrote:
On 14Apr2020 1557, André Malo wrote:
Stefan Behnel wrote:
André Malo schrieb am 14.04.20 um 13:39:
A good way to test that promise (or other implications like performance) might
also be to rewrite the standard library extensions in Cython and
see where it leads.
Not sure I understand what you're saying here. stdlib extension modules are currently written in C, with a bit of code generation. How is that different?
They are C extensions like the ones everybody could write. They should use the same APIs. What I'm saying is, that it would be a good test if the APIs are good enough (for everybody else). If, say, Cython is recommended, some attempt should be made to achieve the same results with Cython. Or some other sets of APIs which are considered for "the public".
I don't think, the current stdlib modules restrict themselves to a limited API. The distinction between "inside" and "outside" bothers me.
It should not bother you. The standard library is not a testing ground for the public API - it's a layer to make those APIs available to users in a reliable, compatible format. Think of it like your C runtime, which uses a lot of system calls that have changed far more often than libc.
I can agree up to a certain level. There are extensions and there are extensions, see below.
We can change the interface between the runtime and the included modules as frequently as we like, because it's private. And we do change them, and the changes go unnoticed because we adapt both sides of the contract at once. For example, we recently changed the calling conventions for certain functions, which didn't break anyone because we updated the callers as well. And we completely reimplemented stat() emulation on Windows recently, which wasn't incompatible because the public part of the API didn't change (except to have fewer false errors).
Modules that are part of the core runtime deliberately use private APIs so that other extension modules don't have to. It's not any sort of unfair advantage - it's a deliberate aspect of the software's design.
Ah, hmm, maybe I was not clear enough. I was talking about extensions like itertools or datetime. Not core builtins like sys or the type system. I think, there's a difference. People do use especially the former ones also as a template how things are done "correctly". I agree, it's easy enough to change everything at once, assuming a good test suite :-) Cheers, nd
On 2020-04-14, 12:35 GMT, Stefan Behnel wrote:
A good way to test that promise (or other implications like performance) might also be to rewrite the standard library extensions in Cython and see where it leads.
Not sure I understand what you're saying here. stdlib extension modules are currently written in C, with a bit of code generation. How is that different?
When you are saying that writing C extensions is unnecessary, because everything can be easily written in Cython, start persuading me by rewriting all C extensions included in CPython into Cython. If you are not willing to do it, why I should I start rewriting my 7k lines of SWIG code to Cython, just because you hope that somebody finally finally (please!) notices existence of PyPy and hopefully starts to care about it. No, they won’t. Matěj -- https://matej.ceplovi.cz/blog/, Jabber: mcepl@ceplovi.cz GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8 Never, never, never believe any war will be smooth and easy, or that anyone who embarks on the strange voyage can measure the tides and hurricanes he will encounter. The statesman who yields to war fever must realise that once the signal is given, he is no longer the master of policy but the slave of unforeseeable and uncontrollable events. -- Winston Churchill, 1930
Steve Dower schrieb am 14.04.20 um 00:27:
On 13Apr2020 2308, André Malo wrote:
For one thing, if you open up APIs for Cython, they're open for everybody (Cython being "just" another C extension). More to the point: The ABIs have the same problem as they have now, regardless how responsive the Cython developers are. Once you compiled the extension, you're using the ABI and are supposedly not required to recompile to stay compatible.
So, where I'm getting at is: Either you open up to everybody or nobody. In C there's not really an in-between.
On a technical level, you are correct.
On a policy level, we don't make changes that would break users of the C API. Because we can't track everyone who's using it, we have to assume that everything is used and any change will cause breakage.
To make sure it's possible to keep developing CPython, we declare parts of the API off limits (typically by prepending them with an underscore). If you use these, and you break, we're sorry but we aren't going to fix it.
This line of discussion is basically saying that we would designate a broader section of the API that is off limits, most likely the parts that are only useful for increased performance (rather than increased functionality). We would then specifically include the Cython team/volunteers in discussions about how to manage changes to these parts of the API to avoid breaking them, and possibly do simultaneous releases to account for changes so that their users have more time to rebuild.
Effectively, when we change our APIs, we would break everyone except Cython because we've worked with them to avoid the breakage. Anyone else using it has to make their own effort to follow CPython development and detect any breakage themselves (just like today).
So probably the part you're missing is where we would give ourselves permission to break more APIs in a release, while simultaneously encouraging people to use Cython as an isolation layer from those breaks.
To add to that, the main difference for users here is a choice: 1) I want to use whatever is in the C-API and will fix my broken code myself whenever there's a new CPython release. 2) I write my code against the stable ABI, accept the performance limitations, and hope that it'll "never" break and my code just keeps working (even through future compatibility layers, if necessary). 3) I use Cython and rerun it on my code at least once for each new CPython release series, because I want to get the best performance for each target version. 4) I use Cython and activate its (yet to be completed) stable ABI mode, so that I don't have to target separate (C)Python releases but can release a single wheel, at the cost of reduced performance. And then there are a couple of grey areas, e.g. people using Cython plus a bit of the C-API directly, for which they are then responsible themselves again. But it's still way easier to adapt 3% of your code every couple of CPython releases than all of your modules for each new release. That's just the normal price that you pay for manual optimisations. A nice feature of Cython here is that 3) and 4) are actually not mutually exclusive, at least as it looks so far. You should eventually be able to generate both from your same sources (we are trying hard to keep them in the same C file), and even mix them on PyPI, e.g. distribute a generic stable ABI wheel for all Pythons that support it, plus accelerated wheels for CPython 3.9 and 3.10. You may even be able to release a pure Python wheel as well, as we currently do for Cython itself to better support PyPy. And to drive the point home, if CPython starts changing its C-API more radically, or comes up with a new one, we can add the support for it to Cython and then, in the best case, users will still only have to rerun it on their code to target that new API. Compare that to case 1).
(Cython is still just a placeholder name here, btw. There are 1-2 other projects that could be considered instead, though I think Cython is the only one that also provides a usability improvement as well as API stability.)
pybind11 and mypyc could probably make a similar offer to users. The important point is just that we centralise the abstraction and adaptation work. Stefan
Le ven. 10 avr. 2020 à 22:00, Antoine Pitrou <solipsis@pitrou.net> a écrit :
Examples of issues to make structures opaque:
* ``PyGC_Head``: https://bugs.python.org/issue40241 * ``PyObject``: https://bugs.python.org/issue39573 * ``PyTypeObject``: https://bugs.python.org/issue40170
How do you keep fast type checking such as PyTuple_Check() if extension code doesn't have access e.g. to tp_flags?
Hum. I should clarify that we have the choice to not having any impact on performance for the regular runtime: only use opaque function for the "new" runtime. It's exactly what is already done with the Py_LIMITED_API. Concrete example: static inline int PyType_HasFeature(PyTypeObject *type, unsigned long feature) { #ifdef Py_LIMITED_API return ((PyType_GetFlags(type) & feature) != 0); #else return ((type->tp_flags & feature) != 0); #endif } The Py_LIMITED_API goes through PyType_GetFlags() function call, otherwise PyTypeObject.tp_flags field is accessed directly. I recently modified this function to: static inline int PyType_HasFeature(PyTypeObject *type, unsigned long feature) { return ((PyType_GetFlags(type) & feature) != 0); } I consider that checking a type is not performance critical and so I chose to have the same implementation for everyone. If someone sees that it's major performance overhead, we can visit this choice and reintroduce an #ifdef. It's more a practical issue about the maintenance of two flavors of Python in the same code base. Do you want to have two implementations of each function? Or is it possible to have a single implementation for some functions? I suggest to reduce the code duplication and accept a performance overhead when it's small enough.
O(1) bytearray to bytes conversion ..................................
Convert bytearray to bytes without memory copy. (...)
If that's desirable (I'm not sure it is), (...)
Hum, maybe I should clarify the whole "New optimized CPython runtime" section. The list of optimizations are not optimizations that must be implemented. There are more examples of optimizations which becomes possible to implement, or at least easier to implement, once the C API will be fixed. I'm not sure that "bytearray to bytes conversion" is performance bottleneck. It's just that such optimization is easier to explain that other more complex optimizations ;-) The intent of this PEP is not to design a faster CPython, but to show that reworking the C API allows to implement such faster CPython.
Fork and "Copy-on-Read" problem ...............................
Solve the "Copy on read" problem with fork: store reference counter outside ``PyObject``.
Nowadays it is strongly recommended to use multiprocessing with the "forkserver" start method: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-me...
I understood that the Instagram workload is to load heavy data only once, and fork later. I 'm not sure that forkserver fits such workload.
One solution for that would be to store reference counters outside ``PyObject``. For example, in a separated hash table (pointer to reference counter). Changing ``PyObject`` structures requires that C extensions don't access them directly.
You're planning to introduce a large overhead for each reference count lookup just to satisfy a rather niche use case? CPython probably does millions of reference counts per second.
Sorry, again, I'm not proposing to move ob_refcnt outside PyObject for everyone. The intent is to show that it becomes possible to do if you have a very specific use case where it would be more efficient. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Le ven. 10 avr. 2020 à 22:00, Antoine Pitrou <solipsis@pitrou.net> a écrit :
How do you keep fast type checking such as PyTuple_Check() if extension code doesn't have access e.g. to tp_flags?
I notice you did: """ Add fast inlined version _PyType_HasFeature() and _PyType_IS_GC() for object.c and typeobject.c. """
So you understand there is a need.
By the way, CPython currently uses statically allocated types for builtin types like str or list. This may have to change to run efficiently multiple subinterepters in parallel: each subinterpeter should have its own heap-allocated type with its own reference counter. Using heap allocated types means that PyUnicode_Check() implementation has to change. It's just another good reason to better hide PyUnicode_Check() implementation right now ;-) Victor
On Sat, 11 Apr 2020 01:52:13 +0200 Victor Stinner <vstinner@python.org> wrote:
By the way, CPython currently uses statically allocated types for builtin types like str or list. This may have to change to run efficiently multiple subinterepters in parallel: each subinterpeter should have its own heap-allocated type with its own reference counter.
Using heap allocated types means that PyUnicode_Check() implementation has to change. It's just another good reason to better hide PyUnicode_Check() implementation right now ;-)
I'm not sure I understand. If PyUnicode_Check() uses tp_flags, it doesn't have to change, precisely. Regards Antoine.
Le ven. 10 avr. 2020 à 22:00, Antoine Pitrou <solipsis@pitrou.net> a écrit :
Debug runtime and remove debug checks in release mode .....................................................
If the C extensions are no longer tied to CPython internals, it becomes possible to switch to a Python runtime built in debug mode to enable runtime debug checks to ease debugging C extensions.
That's the one convincing feature in this PEP, as far as I'm concerned.
In fact, I already implemented this feature in Python 3.8: https://docs.python.org/dev/whatsnew/3.8.html#debug-build-uses-the-same-abi-... Feature implemented on most platforms, except on Android, Cygwin and Windows sadly. You can now switch between a release build of Python and a debug build of Python without having to rebuild your C extensions which were compiled in release mode. If you want, you can use a debug build of some C extensions.You now have many options: the debug ABI is now compatible with the release ABI. This PEP section is mostly a call to remove debug checks in release mode :-) In my latest attempt, I failed to explain that the debug build is now easy enough to be used by developers in practice (I never finished my article explaining how to use it): https://bugs.python.org/issue37406 Steve: the use case is to debug very rare Python crashes (ex: once every two months) of customers who fail to provide a reproducer. My *expectation* is that a debug build should help to reproduce the bug and/or provide more information when the bug happens. My motivation for this feature is also to show that the bug is not on Python but in third-party C extensions ;-) Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Sat, 11 Apr 2020 02:11:41 +0200 Victor Stinner <vstinner@python.org> wrote:
Le ven. 10 avr. 2020 à 22:00, Antoine Pitrou <solipsis@pitrou.net> a écrit :
Debug runtime and remove debug checks in release mode .....................................................
If the C extensions are no longer tied to CPython internals, it becomes possible to switch to a Python runtime built in debug mode to enable runtime debug checks to ease debugging C extensions.
That's the one convincing feature in this PEP, as far as I'm concerned.
In fact, I already implemented this feature in Python 3.8: https://docs.python.org/dev/whatsnew/3.8.html#debug-build-uses-the-same-abi-...
I had missed that. Great! :-) Regards Antoine.
On 11Apr2020 0111, Victor Stinner wrote:
Steve: the use case is to debug very rare Python crashes (ex: once every two months) of customers who fail to provide a reproducer. My *expectation* is that a debug build should help to reproduce the bug and/or provide more information when the bug happens. My motivation for this feature is also to show that the bug is not on Python but in third-party C extensions ;-)
I think your expectation is wrong. If a stack trace of the crash doesn't show that it belongs to the third party module (which most of the ones that are sent back on Windows indeed show), then you need more invasive tracing to show that the issue came from the module. Until we actually have opaque, non-static objects, that doesn't seem to be possible. All you've done right now is enable new inconsistencies and potential issues when mixing debug and release builds. That just makes things harder to diagnose. Cheers, Steve
On 13Apr2020 1122, Steve Dower wrote:
On 11Apr2020 0111, Victor Stinner wrote:
Steve: the use case is to debug very rare Python crashes (ex: once every two months) of customers who fail to provide a reproducer. My *expectation* is that a debug build should help to reproduce the bug and/or provide more information when the bug happens. My motivation for this feature is also to show that the bug is not on Python but in third-party C extensions ;-)
I think your expectation is wrong. If a stack trace of the crash doesn't show that it belongs to the third party module (which most of the ones that are sent back on Windows indeed show), then you need more invasive tracing to show that the issue came from the module. Until we actually have opaque, non-static objects, that doesn't seem to be possible.
All you've done right now is enable new inconsistencies and potential issues when mixing debug and release builds. That just makes things harder to diagnose.
I think what you really wanted to do here was have a build option _other_ than the debug flag to turn on additional checks. Like you did with tracemalloc. The debug flag turns on additional runtime checks in the underlying C compiler and runtime on Windows (and I presume elsewhere? Is this such a crazy idea?), such as buffer overrun detection and memory misuse. The only way to make a debug build properly compatible with a release build is to disable these checks, which leaves us completely unable to take advantage of them. It also significantly speeds up compile time, which is very useful as a developer. But if your goal is to have a release build that includes additional ABI-transparent checks, then I don't see why you wouldn't just build with those options? It's not like CPython takes that long to build from a clean working directory. Cheers, Steve
On 10/04/2020 18:20, Victor Stinner wrote:
Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython.
If this is true, the documentation on python.org needs a serious rewrite. I am in the throes of writing a C extension, and using Cython or cffi never even crossed my mind. -- Rhodri James *-* Kynesim Ltd
On 11Apr2020 1156, Rhodri James wrote:
On 10/04/2020 18:20, Victor Stinner wrote:
Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython.
If this is true, the documentation on python.org needs a serious rewrite. I am in the throes of writing a C extension, and using Cython or cffi never even crossed my mind.
Sorry you missed the first two sections: "Recommended third party tools" and "Creating extensions without third party tools". https://docs.python.org/3/extending/index.html If you have any suggestions on how to make this recommendation more obvious, please open an issue and describe what would have helped. Cheers, Steve
On 13/04/2020 11:17, Steve Dower wrote:
On 11Apr2020 1156, Rhodri James wrote:
On 10/04/2020 18:20, Victor Stinner wrote:
Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython.
If this is true, the documentation on python.org needs a serious rewrite. I am in the throes of writing a C extension, and using Cython or cffi never even crossed my mind.
Sorry you missed the first two sections: "Recommended third party tools" and "Creating extensions without third party tools".
"Creating extensions without third party tools" is what I read, because the preceding sections suggested to me that was what I was supposed to do. The opening paragraph of the document more or less reads as "This is how you write C extensions." It's an intro. That's fair enough. The next section, "Recommended third party tools", basically says "Third party tools exist." It notably does not say "Use them in preference to what follows," so I didn't even look at them. There is in fact a mild statement at the end of the first paragraph of the next section that should have clued me in, but I missed it because I'd already skipped to the table of contents.
If you have any suggestions on how to make this recommendation more obvious, please open an issue and describe what would have helped.
I'll give it some thought, but fundamentally if you want people to use the third party tools, you need a much stronger statement to that effect. I'm sure I'm not the only one whose reaction to "third party" is "not official then". -- Rhodri James *-* Kynesim Ltd
On Mon, 13 Apr 2020 at 11:20, Steve Dower <steve.dower@python.org> wrote:
On 11Apr2020 1156, Rhodri James wrote:
On 10/04/2020 18:20, Victor Stinner wrote:
Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython.
If this is true, the documentation on python.org needs a serious rewrite. I am in the throes of writing a C extension, and using Cython or cffi never even crossed my mind.
Sorry you missed the first two sections: "Recommended third party tools" and "Creating extensions without third party tools".
https://docs.python.org/3/extending/index.html
If you have any suggestions on how to make this recommendation more obvious, please open an issue and describe what would have helped.
Personally, I'd say that "recommended 3rd party tools" reads as saying "if you want a 3rd party tool to build extensions, these are good (and are a lot easier than using the raw C API)". That's a lot different than saying "we recommend that people writing C extensions do not use the raw C API, but use one of these tools instead". Also, if we *are* going to push people away from the raw C API, then I think we should be recommending a particular tool (likely Cython) as what people writing their first extension (or wanting to switch from the raw C API for the first time) should use. Faced with the API docs, and a list of 3rd party options, I know that *I* am likely to say "yeah, leave that research for another day, I'll use what's in the docs in front of me for now". Also, if we are expecting to push people towards 3rd party tools, that seems to me to be a relatively significant shift in emphasis, and one we should be publicising more directly (via What's New, and blog postings / release announcements, etc.) In the absence of anything like that, I think it's quite reasonable for people to gravitate towards the traditional C API. Having said all this, I *do* think that promoting some 3rd party tool (as I say, I suspect this would be Cython) as the recommended means of writing C extensions, is a reasonable approach to take. I just object to it happening "quietly" via changes like this which make it harder to use the raw C API, justifying themselves by saying "you shouldn't do that anyway". On a related but different note, what is the recommended policy (assuming it's not to use the C API) for embedding Python, and for exposing the embedding app to Python as a C extension? My standard example of this is the Vim interface to Python - see https://github.com/vim/vim/blob/master/src/if_python3.c. I originally wrote this back in the Python 1.5 days, so it's *very* old, and quite likely not how I'd write it now, even using the C API. But what's the recommendation for code like that in the face of these changes, and the suggestion that using 3rd party tools is the normal way to write C extensions? Paul
On Apr 13, 2020, at 5:25 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On a related but different note, what is the recommended policy (assuming it's not to use the C API) for embedding Python, and for exposing the embedding app to Python as a C extension? My standard example of this is the Vim interface to Python - see https://github.com/vim/vim/blob/master/src/if_python3.c <https://github.com/vim/vim/blob/master/src/if_python3.c>. I originally wrote this back in the Python 1.5 days, so it's *very* old, and quite likely not how I'd write it now, even using the C API. But what's the recommendation for code like that in the face of these changes, and the suggestion that using 3rd party tools is the normal way to write C extensions?
I’d like to +1 this request for a standard for embedding Python while at the same time exposing the embedding app to Python as a C extension. We do something remarkably similar to Vim here (along with other files in the same directory). https://github.com/nion-software/nionui-tool/blob/master/launcher/PythonStub... <https://github.com/nion-software/nionui-tool/blob/master/launcher/PythonStubs.cpp> I’ve looked into cffi but it seems to only solve a fraction of the problem. Our Qt-based application embeds Python and provides callbacks from Python to our application. It runs on macOS, Linux, and Windows and runs unchanged on Python 3.6, 3.7, and 3.8 since it dynamically links to Python.
On 13Apr2020 1325, Paul Moore wrote:
Personally, I'd say that "recommended 3rd party tools" reads as saying "if you want a 3rd party tool to build extensions, these are good (and are a lot easier than using the raw C API)". That's a lot different than saying "we recommend that people writing C extensions do not use the raw C API, but use one of these tools instead".
Yeah, that's fair. But at the same time, saying anything more strong is an endorsement that we might have to withdraw at some point in the future (if the project we recommend implodes, for example).
Also, if we *are* going to push people away from the raw C API, then I think we should be recommending a particular tool (likely Cython) as what people writing their first extension (or wanting to switch from the raw C API for the first time) should use. Faced with the API docs, and a list of 3rd party options, I know that *I* am likely to say "yeah, leave that research for another day, I'll use what's in the docs in front of me for now". Also, if we are expecting to push people towards 3rd party tools, that seems to me to be a relatively significant shift in emphasis, and one we should be publicising more directly (via What's New, and blog postings / release announcements, etc.) In the absence of anything like that, I think it's quite reasonable for people to gravitate towards the traditional C API.
Right, except we haven't decided to do it yet. There's still a debate about whether the current third party tools are even sufficient (not to mention what "sufficient" means).
Having said all this, I *do* think that promoting some 3rd party tool (as I say, I suspect this would be Cython) as the recommended means of writing C extensions, is a reasonable approach to take. I just object to it happening "quietly" via changes like this which make it harder to use the raw C API, justifying themselves by saying "you shouldn't do that anyway".
Agreed, I'd rather be up front about it.
On a related but different note, what is the recommended policy (assuming it's not to use the C API) for embedding Python, and for exposing the embedding app to Python as a C extension? My standard example of this is the Vim interface to Python - see https://github.com/vim/vim/blob/master/src/if_python3.c. I originally wrote this back in the Python 1.5 days, so it's *very* old, and quite likely not how I'd write it now, even using the C API. But what's the recommendation for code like that in the face of these changes, and the suggestion that using 3rd party tools is the normal way to write C extensions?
I don't think any current 3rd party tools really help with embedding (I say that as a regular embedder, not as someone who skim-read their docs). In this case, you really do need low-level access to Python's thread and memory management, and the ability to interact directly with the rest of your application's data structures. PyBind11 is the best I've used here - Cython insists on including all its boilerplate to make a complete module, which often is not what you want. But there's a lot of core things that need to be improved if embedding is going to get any better, as I've posted often enough. We can't rely on third-party tools here, yet. Cheers, Steve
On Mon, Apr 13, 2020 at 9:00 AM Steve Dower <steve.dower@python.org> wrote:
On 13Apr2020 1325, Paul Moore wrote:
Personally, I'd say that "recommended 3rd party tools" reads as saying "if you want a 3rd party tool to build extensions, these are good (and are a lot easier than using the raw C API)". That's a lot different than saying "we recommend that people writing C extensions do not use the raw C API, but use one of these tools instead".
Yeah, that's fair. But at the same time, saying anything more strong is an endorsement that we might have to withdraw at some point in the future (if the project we recommend implodes, for example).
Ok, so put that in a Pros/Cons list that provides guidance as to what interface and tools to choose when writing a new extension module. Personally, I'd put Cython (and other "big" packages, numpy, requests and such) on par with CPython itself with respect to "likely to implode and become unusable."
Was it regular cffi or cffi's embedding API, which is used a bit differently than regular cffi, that "seems to only solve a fraction of the problem"? Was just playing around with the embedding API and was impressed. In Python: @ffi.def_extern() def uwsgi_pyexample_init(): print("init called") return 0 In C (embedded in the same plugin): CFFI_DLLEXPORT struct uwsgi_plugin pyexample_plugin = { .init = uwsgi_pyexample_init }; Seems to be happily importing and exporting APIs. Interpreter starts the first time a @ffi.def_extern() function is called. https://cffi.readthedocs.io/en/latest/embedding.html https://github.com/unbit/uwsgi/blob/f6ad0c6dfe431d91ffe365bed3105ed052bef6e4...
On Apr 13, 2020, at 11:26 AM, Daniel Holth <dholth@gmail.com> wrote:
Was it regular cffi or cffi's embedding API, which is used a bit differently than regular cffi, that "seems to only solve a fraction of the problem"? Was just playing around with the embedding API and was impressed.
In Python:
@ffi.def_extern() def uwsgi_pyexample_init(): print("init called")
return 0
In C (embedded in the same plugin):
CFFI_DLLEXPORT struct uwsgi_plugin pyexample_plugin = { .init = uwsgi_pyexample_init };
Seems to be happily importing and exporting APIs. Interpreter starts the first time a @ffi.def_extern() function is called.
https://cffi.readthedocs.io/en/latest/embedding.html <https://cffi.readthedocs.io/en/latest/embedding.html>
https://github.com/unbit/uwsgi/blob/f6ad0c6dfe431d91ffe365bed3105ed052bef6e4... <https://github.com/unbit/uwsgi/blob/f6ad0c6dfe431d91ffe365bed3105ed052bef6e4/plugins/pyexample/pyexample_plugin.py> I might need to understand cffi embedding more to really answer your question - and it’s entirely possible cffi can do this - but as a simple example:
How would I call a Python function from the C++ application that returns a Python object to C++ and then call a method on that Python object from C++? My specific example is that I create Python handlers for Qt windows and then from the Qt/C++ I call methods on those Python objects from C++ such as “handle mouse event”.
It can be done exactly as passing a void* when registering a C callback, and getting it passed back to your callback function. https://cffi.readthedocs.io/en/latest/ref.html#ffi-new-handle-ffi-from-handl... https://bitbucket.org/dholth/kivyjoy/src/aaeab79b2891782209a1219cd65a4d9716c... https://bitbucket.org/dholth/kivyjoy/src/aaeab79b2891782209a1219cd65a4d9716c...
Sorry that this is a bit off-topic. cffi would be a user of any new C API. I've tried to make sure ABI3 is supported in setuptools and wheel, with varying success. Apparently virtualenvs and Windows have problems. I'm excited about the possibility of a better C API and possibly ABI.
On 13Apr2020 2105, Chris Meyer wrote:
How would I call a Python function from the C++ application that returns a Python object to C++ and then call a method on that Python object from C++?
My specific example is that I create Python handlers for Qt windows and then from the Qt/C++ I call methods on those Python objects from C++ such as “handle mouse event”.
You're in a bit of trouble here regardless, depending on how robust you need to be. If you've only got synchronous, single-threaded event handlers then you'll be okay. Anything more complex and you'll have some fun debugging sessions to look forward to. I would definitely say look at PyBind11. A while ago I posted a sample using this to embed Python in a game engine at https://devblogs.microsoft.com/python/embedding-python-in-a-cpp-project-with... (VS is not required, it just happened to be the hook to do the post/video ;) ) To jump straight to the code, go to https://github.com/zooba/ogre3d-python-embed/blob/master/src/PythonCharacter... and search for "py::", and also https://github.com/zooba/ogre3d-python-embed/blob/master/src/ogre_module.h PyBind11 is nice for avoiding the boilerplate and ref-counting, but has its own set of obscure error cases. It's also not as easy to debug as Cython or going straight to the Python C API, depending on what the issue is, as there's no straightforward generated code. Even stepping through the templated code interactively in VS doesn't help make it any easier to follow. Cheers, Steve
On 2020-04-13, 17:39 GMT, Eric Fahlgren wrote:
Ok, so put that in a Pros/Cons list that provides guidance as to what interface and tools to choose when writing a new extension module. Personally, I'd put Cython (and other "big" packages, numpy, requests and such) on par with CPython itself with respect to "likely to implode and become unusable."
Time for the unplesant questions: what is the bus factor of Cython? Best, Matěj -- https://matej.ceplovi.cz/blog/, Jabber: mcepl@ceplovi.cz GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8 To love another person Is to see the face of God. -- yes, incredibly cheesy verse from the screenplay of the movie Les Miserables (2012)
Paul Moore schrieb am 13.04.20 um 14:25:
On a related but different note, what is the recommended policy (assuming it's not to use the C API) for embedding Python, and for exposing the embedding app to Python as a C extension? My standard example of this is the Vim interface to Python - see https://github.com/vim/vim/blob/master/src/if_python3.c. I originally wrote this back in the Python 1.5 days, so it's *very* old, and quite likely not how I'd write it now, even using the C API. But what's the recommendation for code like that in the face of these changes, and the suggestion that using 3rd party tools is the normal way to write C extensions?
Embedding is not very well documented overall. I recently looked through the docs to collect what a user would need to know in this case, and ended up creating at least a little link collection, because I failed to find a good place to refer users to. The things people need to know from the CPython docs are scattered across different places, and lack a complete real-world-like example that "most people" could start from. (I don't think many users will pass strings into Python to execute code there.) https://cython.readthedocs.io/en/latest/src/tutorial/embedding.html From Cython's PoV, the main thing that future embedders need to understand is that it's not really different from extending – you just have to start the Python runtime before doing anything else. I think there should be some help for getting that done, and then it's just executing your Python code in some module. Cython then has its ways to go back and forth from there, e.g. by writing cdef (C) functions as entry points for your application. Cython currently doesn't really have "direct" support for embedding. You can let it generate a C main function for you to start your program, but that's not what you want in the case of vim. There's a "cython_freeze" script that generates an inittab list in addition, but it's a bit simplistic and not integrated. We have a beginners ticket for integrating it better: https://github.com/cython/cython/issues/2849 What I would like to see eventually is to let users pass a list of modules into Cython's frontend (maybe cythonize(), maybe not), and then it would just generate a single distutils Extension from them that links everything together and registers all modules on import, optionally with a generated exported C function that starts up the whole thing. That seems simple enough to do and use, and you end up with a shared library that your application can load. PRs welcome. :) Stefan
On 10.04.2020 20:20, Victor Stinner wrote:
Hi,
Here is a first draft a PEP which summarize the research work I'm doing on CPython C API since 2017 and the changes that me and others already made since Python 3.7 towards an "opaque" C API. The PEP is also a collaboration with developers of PyPy, HPy, Rust-CPython and many others! Thanks to everyone who helped me to write it down!
Maybe this big document should be reorganized as multiple smaller better defined goals: as multiple PEPs. The PEP is quite long and talks about things which are not directly related. It's a complex topic and I chose to put everything as a single document to have a good starting point to open the discussion. I already proposed some of these ideas in 2017: see the Prior Art section ;-)
The PEP can be read on GitHub where it's better formatted: https://github.com/vstinner/misc/blob/master/cpython/pep-opaque-c-api.rst
If someone wants to work on the PEP itself, the document on GitHub is the current reference.
Victor
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ PEP xxx: Modify the C API to hide implementation details ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Abstract ========
* Hide implementation details from the C API to be able to `optimize CPython`_ and make PyPy more efficient. * The expectation is that `most C extensions don't rely directly on CPython internals`_ and so will remain compatible. * Continue to support old unmodified C extensions by continuing to provide the fully compatible "regular" CPython runtime. * Provide a `new optimized CPython runtime`_ using the same CPython code base: faster but can only import C extensions which don't use implementation details. Since both CPython runtimes share the same code base, features implemented in CPython will be available in both runtimes. * `Stable ABI`_: Only build a C extension once and use it on multiple Python runtimes and different versions of the same runtime. * Better advertise alternative Python runtimes and better communicate on the differences between the Python language and the Python implementation (especially CPython).
Note: Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython.
Rationale =========
To remain competitive in term of performance with other programming languages like Go or Rust, Python has to become more efficient.
Make Python (at least) two times faster ---------------------------------------
The C API leaks too many implementation details which prevent optimizing CPython. See `Optimize CPython`_.
PyPy's support for Python's C API (pycext) is slow because it has to emulate CPython internals like memory layout and reference counting. The emulation causes memory overhead, memory copies, conversions, etc. See `Inside cpyext: Why emulating CPython C API is so Hard <https://morepypy.blogspot.com/2018/09/inside-cpyext-why-emulating-cpython-c.html>`_ (Sept 2018) by Antonio Cuni.
While this PEP may make CPython a little bit slower in the short term, the long-term goal is to make "Python" at least two times faster. This goal is not hypothetical: PyPy is already 4.2x faster than CPython and is fully compatible. C extensions are the bottleneck of PyPy. This PEP proposes a migration plan to move towards opaque C API which would make PyPy faster.
Separated the Python language and the CPython runtime (promote alternative runtimes) ------------------------------------------------------------------------------------
The Python language should be better separated from its runtime. It's common to say "Python" when referring to "CPython". Even in this PEP :-)
Because the CPython runtime remains the reference implementation, many people believe that the Python language itself has design flaws which prevent it from being efficient. PyPy proved that this is a false assumption: on average, PyPy runs Python code 4.2 times faster than CPython.
One solution for separating the language from the implementation is to promote the usage of alternative runtimes: not only provide the regular CPython, but also PyPy, optimized CPython which is only compatible with C extensions using the limited C API, CPython compiled in debug mode to ease debugging issues in C extensions, RustPython, etc.
To make alternative runtimes viable, they should be competitive in term of features and performance. Currently, C extension modules remain the bottleneck for PyPy.
Most C extensions don't rely directly on CPython internals ----------------------------------------------------------
While the C API is still tidely coupled to CPython internals, in practical, most C extensions don't rely directly on CPython internals.
The expectation is that these C extensions will remain compatible with an "opaque" C API and only a minority of C extensions will have to be modified.
Moreover, more and more C extensions are implemented in Cython or cffi. Updating Cython and cffi to be compatible with the opaque C API will make all these C extensions without having to modify the source code of each extension.
Stable ABI ----------
The idea is to build a C extension only once: the built binary will be usable on multiple Python runtimes and different versions of the same runtime (stable ABI).
The idea is not new but is an extension of the `PEP 384: Defining a Stable ABI <https://www.python.org/dev/peps/pep-0384/>`__ implemented in CPython 3.4 with its "limited C API". The limited API is not used by default and is not widely used: PyQt is one of the only few known users.
The idea here is that the default C API becomes the limited C API and so all C extensions will benefit of advantages of a stable ABI.
In my practice with helping maintain a C extension module, it's not a problem to build the module separately for every minor release. That's because there are only a few officially supported releases, and they aren't released frequently. Conversely, if you are using a "limited ABI", you are "limited" (pun intended) to what it has and can't take advantage of any new features until the next major Python version -- i.e. for potentially several years! So I don't see any "advantages of a stable ABI" atm that matter in practice while I do see _dis_advantages. So this area can perhaps be excluded from the PEP or at least given low priority. Unless, of course, you have some other, more real upcoming "advantages" in mind.
Flaws of the C API ==================
Borrowed references -------------------
A borrowed reference is a pointer which doesn't “hold” a reference. If the object is destroyed, the borrowed reference becomes a dangling pointer, pointing to freed memory which might be reused by a new object. Borrowed references can lead to bugs and crashes when misused. An example of a CPython bug caused by this is `bpo-25750: crash in type_getattro() <https://bugs.python.org/issue25750>`_.
Borrowed references are a problem whenever there is no reference to borrow: they assume that a referenced object already exists (and thus has a positive reference count).
Tagged pointers are an example of this problem: since there is no concrete ``PyObject*`` to represent the integer, it cannot easily be manipulated.
This issue complicates optimizations like PyPy's list strategies: if a list contains only small integers, it is stored as a compact C array of longs. The equivalent of ``PyObject`` is only created when an item is accessed. (Most of the time the object is optimized away by the JIT, but this is another story.) This makes it hard to support the C API function ``PyList_GetItem()``, which should return a reference borrowed from the list, but the list contains no concrete ``PyObject`` that it could lend a reference to! PyPy's current solution is very bad: the first time ``PyList_GetItem()`` is called, the whole list is de-optimized (converted to a list of ``PyObject*``). See ``cpyext`` ``get_list_storage()``.
See also the Specialized list use case, which is the same optimization applied to CPython. Like in PyPy, this optimization is incompatible with borrowed references since the runtime cannot guess when the temporary object should be destroyed.
If ``PyList_GetItem()`` returned a strong reference, the ``PyObject*`` could just be allocated on the fly and destroyed when the user decrements its reference count. Basically, by putting borrowed references in the API, we are making it impossible to change the underlying data structure.
Functions stealing strong references ------------------------------------
There are functions which steal strong references, for example ``PyModule_AddObject()`` and ``PySet_Discard()``. Stealing references is an issue similar to borrowed references.
PyObject** ----------
Some functions of the C API return a pointer to an array of ``PyObject*``:
* ``PySequence_Fast_ITEMS()`` * ``PyTuple_GET_ITEM()`` is sometimes abused to get an array of all of the tuple's contents: ``PyObject **items = &PyTuple_GET_ITEM(0);``
In effect, these functions return an array of borrowed references: like with ``PyList_GetItem()``, all callers of ``PySequence_Fast_ITEMS()`` assume the sequence holds references to its elements.
Leaking structure members -------------------------
``PyObject``, ``PyTypeObject``, ``PyThreadState``, etc. structures are currently public: C extensions can directly read and modify the structure members.
For example, the ``Py_INCREF()`` macro directly increases ``PyObject.ob_refcnt``, without any abstraction. Hopefully, ``Py_INCREF()`` implementation can be modified without affecting the API.
Change the C API ================
This PEP doesn't define an exhaustive list of all C API changes, but define some guidelines of bad patterns which should be avoided in the C API to prevent leaking implementation details.
Separate header files of limited and internal C API ---------------------------------------------------
In Python 3.6, all headers (.h files) were directly in the ``Include/`` directory.
In Python 3.7, work started to move the internal C API into a new subdirectory, ``Include/internal/``. The work continued in Python 3.8 and 3.9. The internal C API is only partially exported: some functions are only declared with ``extern`` and so cannot be used outside CPython (with compilers supporting ``-fvisibility=hidden``, see above), whereas some functions are exported with ``PyAPI_FUNC()`` to make them usable in C extensions. Debuggers and profilers are typical users of the internal C API to inspect Python internals without calling functions (to inspect a coredump for example).
Python 3.9 is now built with ``-fvisibility=hidden`` (supported by GCC and clang): symbols which are not declared with ``PyAPI_FUNC()`` or ``PyAPI_DATA()`` are no longer exported by the dynamical library (libpython).
Another change is to separate the limited C API from the "CPython" C API: Python 3.8 has a new ``Include/cpython/`` sub-directory. It should not be used directly, but it is used automatically from the public headers when the ``Py_LIMITED_API`` macro is not defined.
**Backward compatibility:** fully backward compatible.
**Status:** basically completed in Python 3.9.
Changes without API changes and with minor performance overhead ---------------------------------------------------------------
* Replace macros with static inline functions. Work started in 3.8 and made good progress in Python 3.9. * Modify macros to avoid directly accessing structures fields.
For example, the `Hide implementation detail of trashcan macros <https://github.com/python/cpython/commit/38965ec5411da60d312b59be281f3510d58e0cf1>`_ commit modifies ``Py_TRASHCAN_BEGIN_CONDITION()`` macro to call a new ``_PyTrash_begin()`` function rather than accessing directly ``PyThreadState.trash_delete_nesting`` field.
**Backward compatibility:** fully backward compatible.
**Status:** good progress in Python 3.9.
Changes without API changes but with performance overhead ---------------------------------------------------------
Replace macros or inline functions with regular functions. Work started in 3.9 on a limited set of functions.
Converting macros to function calls can have a small overhead on performances.
For example, ``Py_INCREF()`` macro modifies directly ``PyObject.ob_refcnt``: this macro would become an alias to the opaque ``Py_IncRef()`` function.
It is possible that the regular CPython runtime keeps the ``Py_INCREF()`` macro which modifies directly ``PyObject.ob_refcnt`` to avoid any performance overhead. A tradeoff should be defined to limit differences between the regular and the new optimized CPython runtimes, without hurting too much performances of the regular CPython runtime.
**Backward compatibility:** fully backward compatible.
**Status:** not started. The performance overhead must be measured with benchmarks and this PEP should be accepted.
API and ABI incompatible changes --------------------------------
* Make structures opaque: move them to the internal C API. * Remove functions from the public C API which are tied to CPython internals. Maybe begin by marking these functions as private (rename ``PyXXX`` to ``_PyXXX``) or move them to the internal C API. * Ban statically allocated types (by making ``PyTypeObject`` opaque): enforce usage of ``PyType_FromSpec()``.
Examples of issues to make structures opaque:
* ``PyGC_Head``: https://bugs.python.org/issue40241 * ``PyObject``: https://bugs.python.org/issue39573 * ``PyTypeObject``: https://bugs.python.org/issue40170 * ``PyThreadState``: https://bugs.python.org/issue39573
Another example are ``Py_REFCNT()`` and ``Py_TYPE()`` macros which can currently be used l-value to modify an object reference count or type. Python 3.9 has new ``Py_SET_REFCNT()`` and ``Py_SET_TYPE()`` macros which should be used instead. ``Py_REFCNT()`` and ``Py_TYPE()`` macros should be converted to static inline functions to prevent their usage as l-value.
**Backward compatibility:** backward incompatible on purpose. Break the limited C API and the stable ABI, with the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible.
CPython specific behavior =========================
Some C functions and some Python functions have a behavior which is closely tied to the current CPython implementation.
is operator -----------
The "x is y" operator is closed tied to how CPython allocates objects and to ``PyObject*``.
For example, CPython uses singletons for numbers in [-5; 256] range::
>>> x=1; (x + 1) is 2 True >>> x=1000; (x + 1) is 1001 False
Python 3.8 compiler now emits a ``SyntaxWarning`` when the right operand of the ``is`` and ``is not`` operators is a literal (ex: integer or string), but don't warn if it is ``None``, ``True``, ``False`` or ``Ellipsis`` singleton (`bpo-34850 <https://bugs.python.org/issue34850>`_). Example::
>>> x=1 >>> x is 1 <stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="? True
CPython PyObject_RichCompareBool --------------------------------
CPython considers that two objects are identical if their memory address are equal: ``x is y`` in Python (``IS_OP`` opcode) is implemented internally in C as ``left == right`` where ``left`` and ``right`` are ``PyObject *`` pointers.
The main function to implement comparison in CPython is ``PyObject_RichCompareBool()``. This function considers that two objects are equal if the two ``PyObject*`` pointers are equal (if the two objects are "identical"). For example, ``PyObject_RichCompareBool(obj1, obj2, Py_EQ)`` doesn't call ``obj1.__eq__(obj2)`` if ``obj1 == obj2`` where ``obj1`` and ``obj2`` are ``PyObject*`` pointers.
This behavior is an optimization to make Python more efficient.
For example, the ``dict`` lookup avoids ``__eq__()`` if two pointers are equal.
Another example are Not-a-Number (NaN) floating pointer numbers which are not equal to themselves::
>>> nan = float("nan") >>> nan is nan True >>> nan == nan False
The ``list.__contains__(obj)`` and ``list.index(obj)`` methods are implemented with ``PyObject_RichCompareBool()`` and so rely on objects identity::
>>> lst = [9, 7, nan] >>> nan in lst True >>> lst.index(nan) 2 >>> lst[2] == nan False
In CPython, ``x == y`` is implemented with ``PyObject_RichCompare()`` which don't make the assumption that identical objects are equal. That's why ``nan == nan`` or ``lst[2] == nan`` return ``False``.
Issues for other Python implementations ---------------------------------------
The Python language doesn't require to be implemented with ``PyObject`` structure and use ``PyObject*`` pointers. PyPy doesn't use ``PyObject`` nor ``PyObject*``. If CPython is modified to use `Tagged Pointers`_, CPython would have the same issue.
Alternative Python implementations have to mimick CPython to reduce incompatibilities.
For example, PyPy mimicks CPython behavior for the ``is`` operator with CPython small integer singletons::
>>>> x=1; (x + 1) is 2 True
It also mimicks CPython ``PyObject_RichCompareBool()``. Example with the Not-a-Number (NaN) float::
>>>> nan=float("nan") >>>> nan == nan False >>>> lst = [9, 7, nan] >>>> nan in lst True >>>> lst.index(nan) 2 >>>> lst[2] == nan False
Better advertise alternative Python runtimes ============================================
Currently, PyPy and other "alternative" Python runtimes are not well advertised on the `Python website <https://www.python.org/>`_. They are only listed as the last choice in the Download menu.
Once enough C extensions will be compatible with the limited C API, PyPy and other Python runtimes should be better advertised on the Python website and in the Python documentation, to no longer introduce them as as first-class citizen.
Obviously, CPython is likely to remain the most feature-complete implementation in mid-term, since new PEPs are first implemented in CPython. Limitations can be simply documented, and users should be free to make their own choice, depending on their use cases.
HPy project ===========
The `HPy project <https://github.com/pyhandle/hpy>`__ is a brand new C API written from scratch. It is designed to ease migration from the current C API and to be efficient on PyPy. HPy hides all implementation details: it is based on "handles" so objects cannot be inspected with direct memory access: only opaque function calls are allowed. This abstraction has many benefits:
* No more ``PyObject`` emulation needed: smaller memory footprint in PyPy cpyext, no more expensive conversions. * It is possible to have multiple handles pointing to the same object. It helps to better track the object lifetime and makes the PyPy implementation easier. PyPy doesn't use reference counting but a tracing garbage collector. When the PyPy GC moves objects in memory, handles don't change! HPy uses an array mapping handle to objects: only this array has to be updated. It is way more efficient. * The Python runtime is free to modify deep internals compared to CPython. Many optimizations become possible: see `Optimize CPython`_ section. * It is easy to add a debug wrapper to add checks before and after the function calls. For example, ensure that that GIL is held when calling CPython.
HPy is developed outside CPython, is implemented on top of the existing Python C API, and so can support old Python versions.
By default, binaries compiled in "universal" HPy ABI mode can be used on CPython and PyPy. HPy can also target CPython ABI which has the same performance than native C extensions. See HPy documentation of `Target ABIs documentation <https://github.com/pyhandle/hpy/blob/feature/improve-docs/docs/overview.rst#target-abis>`_.
The PEP moves the C API towards HPy design and API.
New optimized CPython runtime ==============================
Backward incompatible changes is such a pain for the whole Python community. To ease the migration (accelerate adoption of the new C API), one option is to provide not only one but two CPython runtimes:
* Regular CPython: fully backward compatible, support direct access to structures like ``PyObject``, etc. * New optimized CPython: incompatible, cannot import C extensions which don't use the limited C API, has new optimizations, limited to the C API.
Technically, both runtimes would have the same code base, to ease maintenance: CPython. The new optimized CPython would be a ./configure flag to build a different Python. On Windows, it would be a different project of the Visual Studio solution reusing pythoncore project, but define a macro to build enable optimization and change the C API.
The new optimized CPython runtime remains compatible with CPython 3.8 `stable ABI`_.
CPython code base remains 30 years old. Many technical choices made 30 years ago are no longer relevant today. This PEP should ease the development of new Python implementation which would be even more efficient, like PyPy!
Cython and cffi ===============
Cython and cffi should be preferred to write new C extensions. This PEP is about existing C extensions which cannot be rewritten with Cython.
Cython may be modified to add a new build mode where only the "limited C API" is used.
Use Cases =========
Optimize CPython ----------------
The new optimized runtime can implement new optimizations since it only supports C extension modules which don't access Python internals.
Tagged pointers ...............
`Tagged pointer <https://en.wikipedia.org/wiki/Tagged_pointer>`_.
Avoid ``PyObject`` for small objects (ex: small integers, short Latin-1 strings, None and True/False singletons): store the content directly in the pointer, with a tag for the object type.
Tracing garbage collector .........................
Experiment with a tracing garbage collector inside CPython. Keep reference counting for the C API.
Rewriting CPython with a tracing garbage collector is large project which is out of the scope of this PEP. This PEP fix some blockers issues which prevent to start such project today.
One of the issue are functions of the C API which return a pointer like ``PyBytes_AsString()``. Python doesn't know when the caller stops using the pointer, and so cannot move the object in memory (for a moving garbage collector). API like ``PyBuffer`` is better since it requires the caller to call ``PyBuffer_Release()`` when it is done.
Specialized list ................
Specialize lists of small integers: if a list only contains numbers which fit into a C ``int32_t``, a Python list object could use a more efficient ``int32_t`` array to reduce the memory footprint (avoid ``PyObject`` overhead for these numbers).
Temporary ``PyObject`` objects would be created on demand for backward compatibility.
This optimization is less interesting if tagged pointers are implemented.
PyPy already implements this optimization.
O(1) bytearray to bytes conversion ..................................
Convert bytearray to bytes without memory copy.
Currently, bytearray is used to build a bytes string, but it's usually converted into a bytes object to respect an API. This conversion requires to allocate a new memory block and copy data (O(n) complexity).
It is possible to implement O(1) conversion if it would be possible to pass the ownership of the bytearray object to bytes.
That requires modifying the ``PyBytesObject`` structure to support multiple storages (support storing content into a separate memory block).
Fork and "Copy-on-Read" problem ...............................
Solve the "Copy on read" problem with fork: store reference counter outside ``PyObject``.
Currently, when a Python object is accessed, its ``ob_refcnt`` member is incremented temporarily to hold a "strong reference" to it (ensure that it cannot be destroyed while we use it). Many operating system implement fork() using copy-on-write ("CoW"). A memory page (ex: 4 KB) is only copied when a process (parent or child) modifies it. After Python is forked, modifying ``ob_refcnt`` copies the memory page, even if the object is only accessed in "read only mode".
`Dismissing Python Garbage Collection at Instagram <https://engineering.instagram.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172>`_ (Jan 2017) by Instagram Engineering.
Instagram contributed `gc.freeze() <https://docs.python.org/dev/library/gc.html#gc.freeze>`_ to Python 3.7 which works around the issue.
One solution for that would be to store reference counters outside ``PyObject``. For example, in a separated hash table (pointer to reference counter). Changing ``PyObject`` structures requires that C extensions don't access them directly.
Debug runtime and remove debug checks in release mode .....................................................
If the C extensions are no longer tied to CPython internals, it becomes possible to switch to a Python runtime built in debug mode to enable runtime debug checks to ease debugging C extensions.
If using such a debug runtime becomes harder, indirectly it means that runtime debug checks can be removed from the release build. CPython code base is still full of runtime checks calling ``PyErr_BadInternalCall()`` on failure. Removing such checks in release mode can make Python more efficient.
PyPy ----
ujson is 3x faster on PyPy when using HPy instead of the Python C API. See `HPy kick-off sprint report <https://morepypy.blogspot.com/2019/12/hpy-kick-off-sprint-report.html>`_ (December 2019).
This PEP should help to make PyPy cpyext more efficient, or at least ease the migration of C extensions to HPy.
GraalPython -----------
`GraalPython <https://github.com/graalvm/graalpython>`_ is a Python 3 implementation built on `GraalVM <https://www.graalvm.org/>`_ ("Universal VM for a polyglot world"). It is interested in supporting HPy. See `Leysin 2020 Sprint Report <https://morepypy.blogspot.com/2020/03/leysin-2020-sprint-report.html>`_. It would also benefit of this PEP.
RustPython, Rust-CPython and PyO3 ---------------------------------
Rust-CPython is interested in supporting HPy. See `Leysin 2020 Sprint Report <https://morepypy.blogspot.com/2020/03/leysin-2020-sprint-report.html>`_.
RustPython and PyO3 would also benefit of this PEP.
Links:
* `PyO3 <https://github.com/PyO3/pyo3>`_: Rust bindings for the Python (CPython) interpreter * `rust-cpython <https://github.com/dgrunwald/rust-cpython>`_: Rust <-> Python (CPython) bindings * `RustPython <https://github.com/RustPython/RustPython>`_: A Python Interpreter written in Rust
Rejected Ideas ==============
Drop the C API --------------
One proposed alternative to a new better C API is to drop the C API at all. The reasoning is that since existing solutions are already available, complete and reliable, like Cython and cffi.
What about the long tail of C extensions on PyPI which still use the C API? Would a Python without these C extensions would remain relevant?
Lots of project do not use those solution, and the C API is part of Python success. For example, there would be no numpy without the C API.
It doesn't sound like a workable solution.
Bet on HPy, leave the C API unchanged -------------------------------------
The HPy project is developed outside CPython and so doesn't cause any backward incompatibility in CPython. HPy API was designed with efficiency in mind.
The problem is the long tail of C extensions on PyPI which are written with the C API and will not be converted soon or will never be converted to HPy. The transition from Python 2 to Python 3 showed that migrations are very slow and never fully complete.
The PEP also rely on the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible with the new opaque C API.
The concept of HPy is not new: CPython has a limited C API which provides a stable ABI since Python 3.4, see `PEP 384: Defining a Stable ABI <https://www.python.org/dev/peps/pep-0384/>`_. Since it is an opt-in option, most users simply use the **default** C API.
Prior Art =========
* `pythoncapi.readthedocs.io <https://pythoncapi.readthedocs.io/>`_: Research project behind this PEP * July 2019: Keynote `Python Performance: Past, Present, Future <https://github.com/vstinner/talks/raw/master/2019-EuroPython/python_performance.pdf>`_ (slides) by Victor Stinner at EuroPython 2019 * [python-dev] `Make the stable API-ABI usable <https://mail.python.org/pipermail/python-dev/2017-November/150607.html>`_ (November 2017) by Victor Stinner * [python-ideas] `PEP: Hide implementation details in the C API <https://mail.python.org/pipermail/python-ideas/2017-July/046399.html>`_ (July 2017) by Victor Stinner. Old PEP draft which proposed to add an option to build C extensions. * `A New C API for CPython <https://vstinner.github.io/new-python-c-api.html>`_ (Sept 2017) article by Victor Stinner * `Python Performance <https://github.com/vstinner/conf/raw/master/2017-PyconUS/summit.pdf>`_ (May 2017 at the Language Summit) by Victor Stinner: early discusssions on reorganizing header files, promoting PyPy, fix the C API, etc. Discussion summarized in `Keeping Python competitive <https://lwn.net/Articles/723949/>`_ article.
Copyright =========
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
-- Regards, Ivan
On 11/04/2020 13:08, Ivan Pozdeev via Python-Dev wrote:
On 10.04.2020 20:20, Victor Stinner wrote:
Stable ABI ----------
The idea is to build a C extension only once: the built binary will be usable on multiple Python runtimes and different versions of the same runtime (stable ABI).
The idea is not new but is an extension of the `PEP 384: Defining a Stable ABI <https://www.python.org/dev/peps/pep-0384/>`__ implemented in CPython 3.4 with its "limited C API". The limited API is not used by default and is not widely used: PyQt is one of the only few known users.
The idea here is that the default C API becomes the limited C API and so all C extensions will benefit of advantages of a stable ABI.
In my practice with helping maintain a C extension module, it's not a problem to build the module separately for every minor release.
That's because there are only a few officially supported releases, and they aren't released frequently.
Conversely, if you are using a "limited ABI", you are "limited" (pun intended) to what it has and can't take advantage of any new features until the next major Python version -- i.e. for potentially several years!
So I don't see any "advantages of a stable ABI" atm that matter in practice while I do see _dis_advantages. So this area can perhaps be excluded from the PEP or at least given low priority. Unless, of course, you have some other, more real upcoming "advantages" in mind.
PyQt uses the stable ABI because it dramatically reduces the number of wheels that need to be created for a full release. PyQt consists of 6 different PyPI packages. Wheels are provided for 4 different platforms. Currently Python v3.5 to v3.8 are supported. With the stable ABI that's 24 wheels for a full release. No additional wheels are needed when Python v3.9 is supported. Without the stable ABI it would be 96 wheels. 24 additional wheels would be needed when Python v3.9 is supported. Phil
* Hide implementation details from the C API to be able to `optimize CPython`_ and make PyPy more efficient. * The expectation is that `most C extensions don't rely directly on CPython internals`_ and so will remain compatible. * Continue to support old unmodified C extensions by continuing to provide the fully compatible "regular" CPython runtime. * Provide a `new optimized CPython runtime`_ using the same CPython code base: faster but can only import C extensions which don't use implementation details. Since both CPython runtimes share the same code base, features implemented in CPython will be available in both runtimes.
Adding my 2cents from someone who does use the CPython API (for a debugger). I must say I'm -1 until alternative APIs needed are available in the optimized CPython runtime (I'd also say that this is a really big incompatible change and would need a Python 4.0 to do)... I guess that in order for this to work, the first step wouldn't be breaking everyone but talking to extension authors (maybe checking for the users of the APIs which will be deprecated) and seeing alternatives before pushing something which will break CPython extensions which rely on such APIs. I also don't think that CPython should have 2 runtimes... if the idea is to leverage extensions to other CPython implementations, I think going just for a more limited API is the way to go (but instead of just breaking extensions that use the CPython internal API, try to come up with alternative APIs for the users of the current CPython API -- for my use case, I know the debugger could definitely do with just a few simple additions: it uses the internal API mostly because there aren't real alternatives for a couple of use cases). i.e.: if numpy/pandas/<fav library> doesn't adopt the optimized runtime because they don't have the needed support they need, it won't be useful to have it in the first place (you'd just be in the same place where other Python implementations already are). Also, this should probably follow the usual deprecation cycle: do a major CPython release which warns about using the APIs that'll be deprecated and only in the next CPython release should those APIs be actually removed (and when that's done it probably deserves to be called Python 4). Cheers, Fabio
On 10 Apr 2020, at 19:20, Victor Stinner <vstinner@python.org> wrote:
[…]
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ PEP xxx: Modify the C API to hide implementation details ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Abstract ========
* Hide implementation details from the C API to be able to `optimize CPython`_ and make PyPy more efficient. * The expectation is that `most C extensions don't rely directly on CPython internals`_ and so will remain compatible. * Continue to support old unmodified C extensions by continuing to provide the fully compatible "regular" CPython runtime. * Provide a `new optimized CPython runtime`_ using the same CPython code base: faster but can only import C extensions which don't use implementation details. Since both CPython runtimes share the same code base, features implemented in CPython will be available in both runtimes. * `Stable ABI`_: Only build a C extension once and use it on multiple Python runtimes and different versions of the same runtime. * Better advertise alternative Python runtimes and better communicate on the differences between the Python language and the Python implementation (especially CPython).
Note: Cython and cffi should be preferred to write new C extensions.
I’m too old… I still prefer the CPython ABI over the other two mostly because that’s what I know best but also the reduce dependencies.
This PEP is about existing C extensions which cannot be rewritten with Cython.
I’m not sure what this PEP proposes beyond “lets make the stable ABI the default API” and provide a mechanism to get access to the current API. I guess the proposal also expands the scope for the stable ABI, some internals that are currently exposed in the stable ABI would no longer be so. I’m not opposed to this as long as it is still possible to use the current API, possibly with clean-ups and correctness fixes, As you write the CPython API has some features that make writing correct code harder, in particular the concept of borrowed references. There’s still good reasons to want be as close to the metal as possible, both to get maximal performance and to accomplish things that aren’t possible using the stable ABI. […]
API and ABI incompatible changes --------------------------------
* Make structures opaque: move them to the internal C API. * Remove functions from the public C API which are tied to CPython internals. Maybe begin by marking these functions as private (rename ``PyXXX`` to ``_PyXXX``) or move them to the internal C API. * Ban statically allocated types (by making ``PyTypeObject`` opaque): enforce usage of ``PyType_FromSpec()``.
Examples of issues to make structures opaque:
* ``PyGC_Head``: https://bugs.python.org/issue40241 * ``PyObject``: https://bugs.python.org/issue39573 * ``PyTypeObject``: https://bugs.python.org/issue40170 * ``PyThreadState``: https://bugs.python.org/issue39573
Another example are ``Py_REFCNT()`` and ``Py_TYPE()`` macros which can currently be used l-value to modify an object reference count or type. Python 3.9 has new ``Py_SET_REFCNT()`` and ``Py_SET_TYPE()`` macros which should be used instead. ``Py_REFCNT()`` and ``Py_TYPE()`` macros should be converted to static inline functions to prevent their usage as l-value.
**Backward compatibility:** backward incompatible on purpose. Break the limited C API and the stable ABI, with the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible.
This is definitely backward incompatible in a way that affects all extensions defining types without using PyTypeSpec due to having PyObject ad PyTypeObject in the list. I wonder how large a percentage of existing extensions is affected by this. Making “PyObject” opaque will also affect the stable ABI because even types defined using the PyTypeSpec API embed a “PyObject” value in the structure defining the instance layout. It is easy enough to change this in a way that preserves source-code compatibility, but I’m not sure it is possible to avoid breaking the stable ABI. BTW. This will require growing the PyTypeSpec ABI a little, there are features you cannot implement using that API for example the buffer protocol. […]
CPython specific behavior =========================
Some C functions and some Python functions have a behavior which is closely tied to the current CPython implementation.
is operator -----------
The "x is y" operator is closed tied to how CPython allocates objects and to ``PyObject*``.
For example, CPython uses singletons for numbers in [-5; 256] range::
x=1; (x + 1) is 2 True x=1000; (x + 1) is 1001 False
Python 3.8 compiler now emits a ``SyntaxWarning`` when the right operand of the ``is`` and ``is not`` operators is a literal (ex: integer or string), but don't warn if it is ``None``, ``True``, ``False`` or ``Ellipsis`` singleton (`bpo-34850 <https://bugs.python.org/issue34850>`_). Example::
x=1 x is 1 <stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="? True
That’s not really something for the C API, and code relying on the small integer cache is IMHO buggy as it is (even without considering different python implementations). Is this a problem for alternative implementations? My gut feeling would be that this shouldn’t be a problem, an implementation using tagged pointers for smallish integers could just behave as if all smallish integers are singletons. […]
Use Cases =========
Optimize CPython ----------------
The new optimized runtime can implement new optimizations since it only supports C extension modules which don't access Python internals.
Tagged pointers ...............
`Tagged pointer <https://en.wikipedia.org/wiki/Tagged_pointer>`_.
Avoid ``PyObject`` for small objects (ex: small integers, short Latin-1 strings, None and True/False singletons): store the content directly in the pointer, with a tag for the object type.
Isn’t that already possible to do with the current API contract (when ignoring the stable ABI)? You’re already supposed to use accessor macro’s to access and modify attributes in the PyObject structure, those can be modified to do something else for tagged pointers. Anyone not using the accessor macro’s would have to adjust, but that’s something that can happend regardless (even if we’re more and more careful not to introduce unnecessary breaking changes). […] Ronald — Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/
Hi Ronald, Le mar. 14 avr. 2020 à 18:25, Ronald Oussoren <ronaldoussoren@mac.com> a écrit :
Making “PyObject” opaque will also affect the stable ABI because even types defined using the PyTypeSpec API embed a “PyObject” value in the structure defining the instance layout. It is easy enough to change this in a way that preserves source-code compatibility, but I’m not sure it is possible to avoid breaking the stable ABI.
Oh, that's a good point. I tracked this issue at: https://bugs.python.org/issue39573#msg366473
BTW. This will require growing the PyTypeSpec ABI a little, there are features you cannot implement using that API for example the buffer protocol.
I tracked this feature request at: https://bugs.python.org/issue40170#msg366474 Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On 15 Apr 2020, at 03:39, Victor Stinner <vstinner@python.org> wrote:
Hi Ronald,
Le mar. 14 avr. 2020 à 18:25, Ronald Oussoren <ronaldoussoren@mac.com> a écrit :
Making “PyObject” opaque will also affect the stable ABI because even types defined using the PyTypeSpec API embed a “PyObject” value in the structure defining the instance layout. It is easy enough to change this in a way that preserves source-code compatibility, but I’m not sure it is possible to avoid breaking the stable ABI.
Oh, that's a good point. I tracked this issue at: https://bugs.python.org/issue39573#msg366473
BTW. This will require growing the PyTypeSpec ABI a little, there are features you cannot implement using that API for example the buffer protocol.
I tracked this feature request at: https://bugs.python.org/issue40170#msg366474
Another issue with making structures opaque is that this makes it at best harder to subclass builtin types in an extension while adding additional data fields to the subclass. This is a similar issue as the fragile base class issue that was fixed in Objective-C 2.0 by adding a level of indirection, and could probably be fixed in a similar way in Python. Ronald — Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/ <https://blog.ronaldoussoren.net/>
It seems a little odd to be dictating website updates about other VMs in this PEP. I'm not arguing that we shouldn't update the site, I just think requiring it as part of this PEP seems tangential to what the PEP is focusing on.
participants (16)
-
André Malo
-
Antoine Pitrou
-
Brett Cannon
-
Chris Meyer
-
Daniel Holth
-
Eric Fahlgren
-
Fabio Zadrozny
-
Ivan Pozdeev
-
Matěj Cepl
-
Paul Moore
-
Phil Thompson
-
Rhodri James
-
Ronald Oussoren
-
Stefan Behnel
-
Steve Dower
-
Victor Stinner