[Python-Dev] Re: PEP 620: Hide implementation details from the C API

June 23, 2020

      Le mar. 23 juin 2020 à 16:56, Stefan Behnel <stefan_ml@behnel.de> a écrit :
...
...
Adding a new member breaks the stable ABI (PEP 384), especially for
types declared statically (e.g. ``static PyTypeObject MyType =
{...};``). In Python 3.4, the PEP 442 "Safe object finalization" added
the ``tp_finalize`` member at the end of the ``PyTypeObject`` structure.
For ABI backward compatibility, a new ``Py_TPFLAGS_HAVE_FINALIZE`` type
flag was required to announce if the type structure contains the
``tp_finalize`` member. The flag was removed in Python 3.8 (`bpo-32388
<https://bugs.python.org/issue32388>`_).
Probably not the best example. I think this is pretty much normal API
evolution. Changing the deallocation protocol for objects is going to
impact any public API in one way or another. PyTypeObject is also not
exposed with its struct fields in the limited API, so your point regarding
"tp_print" is also not a strong one.
The PEP 442 doesn't break backward compatibility. C extensions using
tp_dealloc continue to work.

But adding a new member to PyTypeObject caused practical
implementation issues. I'm not sure why you are mentioning the limited
C API. Most C extensions don't use it and declare their type as
"static types".

I'm not trying to describe the Py_TPFLAGS_HAVE_FINALIZE story as a
major blocker issue in the CPython history. It's just one of the many
examples of issues to evolve CPython internals.
...
...
Same CPython design since 1990: structures and reference counting
-----------------------------------------------------------------
Members of ``PyObject`` and ``PyTupleObject`` structures have not
changed since the "Initial revision" commit (1990)
While I see an advantage in hiding the details of PyObject (specifically
memory management internals), I would argue that there simply isn't much to
improve in PyTupleObject, so these two don't fly at the same level for me.
There are different reasons to make PyTupleObject opaque:

* Prevent access to members of its "PyObject ob_base" member (disallow
accessing directly "tuple->ob_base.ob_refcnt")

* Prevent C extensions to make assumptions on how a Python
implementation stores a tuple. Currently, C extensions are designed to
have best performances with CPython, but it makes them run slower on
PyPy.

* It becomes possible to experiment with a more efficient PyTypeObject
layout, in terms of memory footprint or runtime performance, depending
on the use case. For example, storing directly numbers as numbers
rather than PyObject. Or maybe use a different layout to make
PyList_AsTuple() an O(1) operation. I had a similar idea about
converting a bytearray into a bytes without having to copy memory. It
also requires to modify PyBytesObject to experiment such idea. An
array of PyObject* is the most efficient storage for all use cases.
...
My feeling is that PyPy specifically is better served with the HPy API,
which is different enough to consider it a mostly separate API, or an
evolution of the limited API, if you want. Suggesting that extension
authors support two different APIs is much, but forcing them to support the
existing CPython C-API (for legacy reasons) and the changed CPython C-API
(for future compatibility), and then asking them to support a separate
C-API in addition (for platform independence, with performance penalties)
seems stretching it a lot.
The PEP 620 changes the C API to make it converge to the limited C
API, but it also prepares C extensions to ease their migration to HPy.

For example, by design, HPy doesn't give a direct access to the
PyTupleObject.ob_item member. Enforce usage of PyTuple_GetItem()
function or PyTuple_GET_ITEM() maro should ease migration to
HPy_GetItem_i().

I disagree that extension authors have to support two C APIs. Many PEP
620 incompatible C changes are already completed, and I was surprised
by the very low numbers of extensions affected by these changes. In
practice, most extensions use simple and regular C code, they don't
"abuse" the C API. Cython itself is affected by most chances since
Cython basically uses all C API features :-) But in practice, only a
minority of extensions written with Cython are affected, since they
(indirectly, via Cython) only use a subset of the C API.

Also, once an extension is updated for incompatible changes, it
remains compatible with old Python versions. When a new function is
used, pythoncapi_compat.h can be used to support old Python versions.
It is not like code has to be duplicated to support two unrelated
APIs.
...
...
* (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and
  ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and
  ``Py_SIZE()`` macros become functions which cannot be used as l-value.
* (**Completed**) New C API functions must not return borrowed
  references.
* (**In Progress**) Provide ``pythoncapi_compat.h`` header file.
* (**In Progress**) Make structures opaque, add getter and setter
  functions.
* (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``.
* (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and
  ``PyList_GET_ITEM()`` macros to static inline functions.
Most of these have the potential to break code, sometimes needlessly,
AFAICT.
Py_SET_xxx() functions are designed to allow experiment tagged
pointers in CPython. Do you mean that tagged pointers are not worth to
be experimented with? Early Neil's proof-of-concept was promising:
https://mail.python.org/archives/list/capi-sig@python.org/thread/EGAY55ZWMF2...

PyPy decided to abandon tagged pointers, since it wasn't really worth
it in PyPy. But PyPy and CPython have a very different design, IMO the
performance will be more interesting in CPython than in PyPy.
...
Especially the efforts to block away the internal data structures
annoy me. It's obviously ok if we don't require other implementations to
provide this access, but CPython has these data structures and I think it
should continue to expose them.
CPython continues to expose structures in its internal C API.
...
If we remove CPython specific features from the (de-facto) "official public
Python C-API", then I think there should be a "public CPython 3.X C-API"
that actively exposes the data structures natively, not just an "internal"
one. That way, extension authors can take the usual decision between
performance, maintenance effort and platform independence.
I would like to promote "portable" C code, rather than promote writing
CPython specific code.

I mean that the "default" should be the portable API, and writing
CPython specific code would be a deliberate opt-in choice.
...
...
typedef struct {
        PyObject ob_base;
        double ob_fval;
    } PyFloatObject;
Please keep PyFloat_AS_DOUBLE() and friends do what they currently do.
If PyFloatObject becomes opaque, PyFloat_AS_DOUBLE() macro must become
a function call.
...
...
Making ``PyTypeObject`` structure opaque breaks C extensions declaring
types statically (e.g. ``static PyTypeObject MyType = {...};``).
Not necessarily. There was an unimplemented feature proposed in PEP-3121,
the PyType_Copy() function.
https://www.python.org/dev/peps/pep-3121/#specification
PyTypeObject does not have to be opaque. But it also doesn't have to be the
same thing for defining and for using types. You could still define a type
with a PyTypeObject struct and then copy it over into a heap type or other
internal type structure from there.
A practical issue is that many C extensions refer directly to a type
using something like "&MyType". Example in CPython:

#define PyUnicode_CheckExact(op) Py_IS_TYPE(op, &PyUnicode_Type)

If PyType_Copy(&PyUnicode_Type) is used to allocate the real unicode
type as a heap type, code using &PyUnicode_Type will fail.

See https://bugs.python.org/issue40601 "[C API] Hide static types from
the limited C API" about this issue. This issue concerns
subinterpreters: each subinterpreter should have its own (copied of)
types.
...
Whether that's better than using PyType_FromSpec(), maybe not, but at least
it doesn't mean we have to break existing code that uses static extension
type definitions.
If we choose the PyType_Copy() way, we must stop referring to types as
"PyTypeObject*" internally, but maybe use "PyHeapTypeObject*" or
something else. Currently, static types and heap types are
interchangeable on purpose.
...
I haven't come across a use
case yet where I had to change a ref-count by more than 1, but allowing
users to arbitrarily do that may require way more infrastructure under the
hood than allowing them to create or remove a single reference to an
object. I think explicit is really better than implicit here.
Py_SET_REFCNT() is not Py_INCREF(). It's used for special functions
like free lists, resurrect an object, save/restore reference counter
(during resurrection), etc.
...
The same does not seem to apply to "Py_SET_TYPE()" and "Py_SET_SIZE()",
since any object or (applicable) container implementation would normally
have to know its type and size, regardless of any implementation details.
Py_SET_TYPE() is needed to set tp_base on types declared statically.
"tp_base = &PyType_Type" doesn't work on Visual Studio if I recall
correctly. See for example the numpy fix:
https://github.com/numpy/numpy/commit/a96b18e3d4d11be31a321999cda4b795ea9ecc...

Py_SET_SIZE() is needed for types which inherit from PyVarObject, like
PyListObject.
...
...
The important part is coordination and finding a balance between CPython
evolutions and backward compatibility. For example, breaking a random,
old, obscure and unmaintained C extension on PyPI is less severe than
breaking numpy.
This sounds like a common CI testing infrastructure would help all sides.
Currently, we have something like that mostly working by having different
projects integrate with each other's master branch, e.g. Pandas, NumPy,
Cython, and notifying each other of detected breakages. It's mostly every
project setting up its own CI on travis&Co here, so a bit of duplicated
work on all sides. Not sure if that's inherently bad, but there's
definitely some room for generalisation and improvements.
I wrote https://github.com/vstinner/pythonci to test cython, numpy and
a few other projects on the next Python version (master branch).

First, I even wrote a section of the PEP: "please test your project on
the next Python version", but I removed it since it doesn't require
any change in CPython itself, and we cannot require people to do it.
...
Again, thanks Victor for pushing these efforts. Even if me and others are
giving you a hard time getting your proposals accepted, I appreciate the
work that you put into improving the ecosystem(s).
Thanks Stefan for your very useful feedback :-) I'm sure that it will
help to enhance the PEP. I'm open to consider removing a bunch of
incompatible changes like making the PyObject structure opaque.

If you look at my PyObject https://bugs.python.org/issue39573 and
PyTypeObject https://bugs.python.org/issue40170 issues: the changes
that I already pushed are mostly changes to abstract access to these
structures.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.

[Python-Dev] Re: PEP 620: Hide implementation details from the C API

Victor Stinner