On 10 Apr 2020, at 19:20, Victor Stinner <vstinner@python.org> wrote:
[…]
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ PEP xxx: Modify the C API to hide implementation details ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Abstract ========
* Hide implementation details from the C API to be able to `optimize CPython`_ and make PyPy more efficient. * The expectation is that `most C extensions don't rely directly on CPython internals`_ and so will remain compatible. * Continue to support old unmodified C extensions by continuing to provide the fully compatible "regular" CPython runtime. * Provide a `new optimized CPython runtime`_ using the same CPython code base: faster but can only import C extensions which don't use implementation details. Since both CPython runtimes share the same code base, features implemented in CPython will be available in both runtimes. * `Stable ABI`_: Only build a C extension once and use it on multiple Python runtimes and different versions of the same runtime. * Better advertise alternative Python runtimes and better communicate on the differences between the Python language and the Python implementation (especially CPython).
Note: Cython and cffi should be preferred to write new C extensions.
I’m too old… I still prefer the CPython ABI over the other two mostly because that’s what I know best but also the reduce dependencies.
This PEP is about existing C extensions which cannot be rewritten with Cython.
I’m not sure what this PEP proposes beyond “lets make the stable ABI the default API” and provide a mechanism to get access to the current API. I guess the proposal also expands the scope for the stable ABI, some internals that are currently exposed in the stable ABI would no longer be so. I’m not opposed to this as long as it is still possible to use the current API, possibly with clean-ups and correctness fixes, As you write the CPython API has some features that make writing correct code harder, in particular the concept of borrowed references. There’s still good reasons to want be as close to the metal as possible, both to get maximal performance and to accomplish things that aren’t possible using the stable ABI. […]
API and ABI incompatible changes --------------------------------
* Make structures opaque: move them to the internal C API. * Remove functions from the public C API which are tied to CPython internals. Maybe begin by marking these functions as private (rename ``PyXXX`` to ``_PyXXX``) or move them to the internal C API. * Ban statically allocated types (by making ``PyTypeObject`` opaque): enforce usage of ``PyType_FromSpec()``.
Examples of issues to make structures opaque:
* ``PyGC_Head``: https://bugs.python.org/issue40241 * ``PyObject``: https://bugs.python.org/issue39573 * ``PyTypeObject``: https://bugs.python.org/issue40170 * ``PyThreadState``: https://bugs.python.org/issue39573
Another example are ``Py_REFCNT()`` and ``Py_TYPE()`` macros which can currently be used l-value to modify an object reference count or type. Python 3.9 has new ``Py_SET_REFCNT()`` and ``Py_SET_TYPE()`` macros which should be used instead. ``Py_REFCNT()`` and ``Py_TYPE()`` macros should be converted to static inline functions to prevent their usage as l-value.
**Backward compatibility:** backward incompatible on purpose. Break the limited C API and the stable ABI, with the assumption that `Most C extensions don't rely directly on CPython internals`_ and so will remain compatible.
This is definitely backward incompatible in a way that affects all extensions defining types without using PyTypeSpec due to having PyObject ad PyTypeObject in the list. I wonder how large a percentage of existing extensions is affected by this. Making “PyObject” opaque will also affect the stable ABI because even types defined using the PyTypeSpec API embed a “PyObject” value in the structure defining the instance layout. It is easy enough to change this in a way that preserves source-code compatibility, but I’m not sure it is possible to avoid breaking the stable ABI. BTW. This will require growing the PyTypeSpec ABI a little, there are features you cannot implement using that API for example the buffer protocol. […]
CPython specific behavior =========================
Some C functions and some Python functions have a behavior which is closely tied to the current CPython implementation.
is operator -----------
The "x is y" operator is closed tied to how CPython allocates objects and to ``PyObject*``.
For example, CPython uses singletons for numbers in [-5; 256] range::
x=1; (x + 1) is 2 True x=1000; (x + 1) is 1001 False
Python 3.8 compiler now emits a ``SyntaxWarning`` when the right operand of the ``is`` and ``is not`` operators is a literal (ex: integer or string), but don't warn if it is ``None``, ``True``, ``False`` or ``Ellipsis`` singleton (`bpo-34850 <https://bugs.python.org/issue34850>`_). Example::
x=1 x is 1 <stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="? True
That’s not really something for the C API, and code relying on the small integer cache is IMHO buggy as it is (even without considering different python implementations). Is this a problem for alternative implementations? My gut feeling would be that this shouldn’t be a problem, an implementation using tagged pointers for smallish integers could just behave as if all smallish integers are singletons. […]
Use Cases =========
Optimize CPython ----------------
The new optimized runtime can implement new optimizations since it only supports C extension modules which don't access Python internals.
Tagged pointers ...............
`Tagged pointer <https://en.wikipedia.org/wiki/Tagged_pointer>`_.
Avoid ``PyObject`` for small objects (ex: small integers, short Latin-1 strings, None and True/False singletons): store the content directly in the pointer, with a tag for the object type.
Isn’t that already possible to do with the current API contract (when ignoring the stable ABI)? You’re already supposed to use accessor macro’s to access and modify attributes in the PyObject structure, those can be modified to do something else for tagged pointers. Anyone not using the accessor macro’s would have to adjust, but that’s something that can happend regardless (even if we’re more and more careful not to introduce unnecessary breaking changes). […] Ronald — Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/