Le mar. 23 juin 2020 à 16:56, Stefan Behnel <stefan_ml@behnel.de> a écrit :
Adding a new member breaks the stable ABI (PEP 384), especially for types declared statically (e.g. ``static PyTypeObject MyType = {...};``). In Python 3.4, the PEP 442 "Safe object finalization" added the ``tp_finalize`` member at the end of the ``PyTypeObject`` structure. For ABI backward compatibility, a new ``Py_TPFLAGS_HAVE_FINALIZE`` type flag was required to announce if the type structure contains the ``tp_finalize`` member. The flag was removed in Python 3.8 (`bpo-32388 <https://bugs.python.org/issue32388>`_).
Probably not the best example. I think this is pretty much normal API evolution. Changing the deallocation protocol for objects is going to impact any public API in one way or another. PyTypeObject is also not exposed with its struct fields in the limited API, so your point regarding "tp_print" is also not a strong one.
The PEP 442 doesn't break backward compatibility. C extensions using tp_dealloc continue to work. But adding a new member to PyTypeObject caused practical implementation issues. I'm not sure why you are mentioning the limited C API. Most C extensions don't use it and declare their type as "static types". I'm not trying to describe the Py_TPFLAGS_HAVE_FINALIZE story as a major blocker issue in the CPython history. It's just one of the many examples of issues to evolve CPython internals.
Same CPython design since 1990: structures and reference counting ----------------------------------------------------------------- Members of ``PyObject`` and ``PyTupleObject`` structures have not changed since the "Initial revision" commit (1990)
While I see an advantage in hiding the details of PyObject (specifically memory management internals), I would argue that there simply isn't much to improve in PyTupleObject, so these two don't fly at the same level for me.
There are different reasons to make PyTupleObject opaque: * Prevent access to members of its "PyObject ob_base" member (disallow accessing directly "tuple->ob_base.ob_refcnt") * Prevent C extensions to make assumptions on how a Python implementation stores a tuple. Currently, C extensions are designed to have best performances with CPython, but it makes them run slower on PyPy. * It becomes possible to experiment with a more efficient PyTypeObject layout, in terms of memory footprint or runtime performance, depending on the use case. For example, storing directly numbers as numbers rather than PyObject. Or maybe use a different layout to make PyList_AsTuple() an O(1) operation. I had a similar idea about converting a bytearray into a bytes without having to copy memory. It also requires to modify PyBytesObject to experiment such idea. An array of PyObject* is the most efficient storage for all use cases.
My feeling is that PyPy specifically is better served with the HPy API, which is different enough to consider it a mostly separate API, or an evolution of the limited API, if you want. Suggesting that extension authors support two different APIs is much, but forcing them to support the existing CPython C-API (for legacy reasons) and the changed CPython C-API (for future compatibility), and then asking them to support a separate C-API in addition (for platform independence, with performance penalties) seems stretching it a lot.
The PEP 620 changes the C API to make it converge to the limited C API, but it also prepares C extensions to ease their migration to HPy. For example, by design, HPy doesn't give a direct access to the PyTupleObject.ob_item member. Enforce usage of PyTuple_GetItem() function or PyTuple_GET_ITEM() maro should ease migration to HPy_GetItem_i(). I disagree that extension authors have to support two C APIs. Many PEP 620 incompatible C changes are already completed, and I was surprised by the very low numbers of extensions affected by these changes. In practice, most extensions use simple and regular C code, they don't "abuse" the C API. Cython itself is affected by most chances since Cython basically uses all C API features :-) But in practice, only a minority of extensions written with Cython are affected, since they (indirectly, via Cython) only use a subset of the C API. Also, once an extension is updated for incompatible changes, it remains compatible with old Python versions. When a new function is used, pythoncapi_compat.h can be used to support old Python versions. It is not like code has to be duplicated to support two unrelated APIs.
* (**Completed**) Add new functions ``Py_SET_TYPE()``, ``Py_SET_REFCNT()`` and ``Py_SET_SIZE()``. The ``Py_TYPE()``, ``Py_REFCNT()`` and ``Py_SIZE()`` macros become functions which cannot be used as l-value. * (**Completed**) New C API functions must not return borrowed references. * (**In Progress**) Provide ``pythoncapi_compat.h`` header file. * (**In Progress**) Make structures opaque, add getter and setter functions. * (**Not Started**) Deprecate ``PySequence_Fast_ITEMS()``. * (**Not Started**) Convert ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros to static inline functions.
Most of these have the potential to break code, sometimes needlessly, AFAICT.
Py_SET_xxx() functions are designed to allow experiment tagged pointers in CPython. Do you mean that tagged pointers are not worth to be experimented with? Early Neil's proof-of-concept was promising: https://mail.python.org/archives/list/capi-sig@python.org/thread/EGAY55ZWMF2... PyPy decided to abandon tagged pointers, since it wasn't really worth it in PyPy. But PyPy and CPython have a very different design, IMO the performance will be more interesting in CPython than in PyPy.
Especially the efforts to block away the internal data structures annoy me. It's obviously ok if we don't require other implementations to provide this access, but CPython has these data structures and I think it should continue to expose them.
CPython continues to expose structures in its internal C API.
If we remove CPython specific features from the (de-facto) "official public Python C-API", then I think there should be a "public CPython 3.X C-API" that actively exposes the data structures natively, not just an "internal" one. That way, extension authors can take the usual decision between performance, maintenance effort and platform independence.
I would like to promote "portable" C code, rather than promote writing CPython specific code. I mean that the "default" should be the portable API, and writing CPython specific code would be a deliberate opt-in choice.
typedef struct { PyObject ob_base; double ob_fval; } PyFloatObject;
Please keep PyFloat_AS_DOUBLE() and friends do what they currently do.
If PyFloatObject becomes opaque, PyFloat_AS_DOUBLE() macro must become a function call.
Making ``PyTypeObject`` structure opaque breaks C extensions declaring types statically (e.g. ``static PyTypeObject MyType = {...};``).
Not necessarily. There was an unimplemented feature proposed in PEP-3121, the PyType_Copy() function.
https://www.python.org/dev/peps/pep-3121/#specification
PyTypeObject does not have to be opaque. But it also doesn't have to be the same thing for defining and for using types. You could still define a type with a PyTypeObject struct and then copy it over into a heap type or other internal type structure from there.
A practical issue is that many C extensions refer directly to a type using something like "&MyType". Example in CPython: #define PyUnicode_CheckExact(op) Py_IS_TYPE(op, &PyUnicode_Type) If PyType_Copy(&PyUnicode_Type) is used to allocate the real unicode type as a heap type, code using &PyUnicode_Type will fail. See https://bugs.python.org/issue40601 "[C API] Hide static types from the limited C API" about this issue. This issue concerns subinterpreters: each subinterpreter should have its own (copied of) types.
Whether that's better than using PyType_FromSpec(), maybe not, but at least it doesn't mean we have to break existing code that uses static extension type definitions.
If we choose the PyType_Copy() way, we must stop referring to types as "PyTypeObject*" internally, but maybe use "PyHeapTypeObject*" or something else. Currently, static types and heap types are interchangeable on purpose.
I haven't come across a use case yet where I had to change a ref-count by more than 1, but allowing users to arbitrarily do that may require way more infrastructure under the hood than allowing them to create or remove a single reference to an object. I think explicit is really better than implicit here.
Py_SET_REFCNT() is not Py_INCREF(). It's used for special functions like free lists, resurrect an object, save/restore reference counter (during resurrection), etc.
The same does not seem to apply to "Py_SET_TYPE()" and "Py_SET_SIZE()", since any object or (applicable) container implementation would normally have to know its type and size, regardless of any implementation details.
Py_SET_TYPE() is needed to set tp_base on types declared statically. "tp_base = &PyType_Type" doesn't work on Visual Studio if I recall correctly. See for example the numpy fix: https://github.com/numpy/numpy/commit/a96b18e3d4d11be31a321999cda4b795ea9ecc... Py_SET_SIZE() is needed for types which inherit from PyVarObject, like PyListObject.
The important part is coordination and finding a balance between CPython evolutions and backward compatibility. For example, breaking a random, old, obscure and unmaintained C extension on PyPI is less severe than breaking numpy.
This sounds like a common CI testing infrastructure would help all sides. Currently, we have something like that mostly working by having different projects integrate with each other's master branch, e.g. Pandas, NumPy, Cython, and notifying each other of detected breakages. It's mostly every project setting up its own CI on travis&Co here, so a bit of duplicated work on all sides. Not sure if that's inherently bad, but there's definitely some room for generalisation and improvements.
I wrote https://github.com/vstinner/pythonci to test cython, numpy and a few other projects on the next Python version (master branch). First, I even wrote a section of the PEP: "please test your project on the next Python version", but I removed it since it doesn't require any change in CPython itself, and we cannot require people to do it.
Again, thanks Victor for pushing these efforts. Even if me and others are giving you a hard time getting your proposals accepted, I appreciate the work that you put into improving the ecosystem(s).
Thanks Stefan for your very useful feedback :-) I'm sure that it will help to enhance the PEP. I'm open to consider removing a bunch of incompatible changes like making the PyObject structure opaque. If you look at my PyObject https://bugs.python.org/issue39573 and PyTypeObject https://bugs.python.org/issue40170 issues: the changes that I already pushed are mostly changes to abstract access to these structures. Victor -- Night gathers, and now my watch begins. It shall not end until my death.