Slowly bend the C API towards the limited API to get a stable ABI for everyone
Hi, There is a reason why I'm bothering C extensions maintainers and Python core developers with my incompatible C API changes since Python 3.8. Let me share my plan with you :-) In 2009 (Python 3.2), Martin v. Löwis did an amazing job with the PEP 384 "Defining a Stable ABI" to provide a "limited C API" and a "stable ABI" for C extensions: build an extension once, use it on multiple Python versions. Some projects like PyQt5 and cryptograpy use it, but it is just a drop in the PyPI ocean (353,084 projects). I'm trying to bend the "default" C API towards this "limited C API" to make it possible tomorrow to build *more* C extensions for the stable ABI. My goal is that the stable ABI would be the default, and only a minority of C extensions would opt-out because they need to access to more functions for best performance. The basic problem is that at the ABI level, C extensions must only call functions, rather than getting and setting directly to structure members. Structures changes frequently in Python (look at changes between Python 3.2 and Python 3.11), and any minor structure change breaks the ABI. The limited C API hides structures and only use function calls to solve this problem. Since 2020, I'm modifying the C API, one function by one, to slowly hide implementations (prepare the API to make strutures opaque). I focused on the following structures: * PyObject and PyVarObject (bpo-39573) * PyTypeObject (bpo-40170) * PyFrameObject (bpo-40421) * PyThreadState (bpo-39947) The majority of C extensions use functions and macros, they don't access directly structure members. There are a few members which are sometimes accessed directly which prevents making these structures opaque. For example, some old C extensions use obj->ob_type rather than Py_TYPE(obj). Fixing the minority of C extensions should benefit to the majority which may become compatible with the stable ABI. I am also converting macros to static inline functions to fix their API: define parameter types, result type and avoid surprising macros side effects ("macro pitfalls"). I wrote the PEP 670 "Convert macros to functions in the Python C API" for these changes. I wrote the upgrade_pythoncapi.py tool in my pythoncapi_project (*) which modify C code to use Py_TYPE(), Py_SIZE() and Py_REFCNT() rather than accessing directly PyObject and PyVarObject members. (*) https://github.com/pythoncapi/pythoncapi_compat In this tool, I also added "Borrow" variant of functions like PyFrame_GetCode() which returns a strong reference, to replace frame->f_code with _PyFrame_GetCodeBorrow(). In Python 3.11, you cannot use the frame->f_code member anymore, since it has been removed! You must call PyFrame_GetCode() (or pythoncapi_compat _PyFrame_GetCodeBorrow() variant). There are also a few macros which can be used as l-values like Py_TYPE(): "Py_TYPE(type1) = type2" must now be written "Py_SET_TYPE(type1, type2)" to avoid setting directly the tp_type type at the ABI level. I proposed the PEP 674 "Disallow using Py_TYPE() and Py_SIZE() macros as l-values" to solve these issues. Currently, many "functions" are still implemented as macros or static inline functions, so C extensions still access structure members at the ABI level for best Python performance. Converting these to regular functions has an impact on performance and I would prefer to first write a PEP giving the rationale for that. Today, it is not possible yet to build numpy for the stable ABI. The gap is just too large for this big C extension. But step by step, the C API becomes closer to the limited API, and more and more code is ready to be built for the stable ABI. Well, these C API changes have other advantages, like preparing Python for further optimizations, ease Python maintenance, clarify the seperation between the limited C API and the default C API, etc. ;-) Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Wait, where is the HPy project in that plan? :-) The HPy project (brand new C API) is a good solution for the long term! My concerns about HPy right now is that, in short, CPython has to continue supporting the C API for a few more years, and we cannot evolve CPython before it will become reasonable to consider removing the "legacy" C API. I explained that in details in the PEP 674 (Disallow using Py_TYPE() and Py_SIZE() macros as l-values): https://www.python.org/dev/peps/pep-0674/#relationship-with-the-hpy-project In parallel, we should continue promoting the usage of Cython, cffi, pybind11 and HPy, rather than using directly the C API. Victor
Does HPy have any clear guidance or assistance for their users to keep it up to date? I'm concerned that if we simply substitute "support the C API for everyone" with "support the C API for every version of HPy" we're no better off. I think it can be done with clear communication from the HPy project (and us when we endorse it) that they will *never* break compatibility and it's *always* safe (and indeed, essential) for their users to use the latest version. But that's a big commitment that I can't sign them up for. Cython seems to manage it okay. I can't remember the last compat issue I had there that wasn't on our (C-API) side. Thoughts? Cheers, Steve On 1/28/2022 4:50 PM, Victor Stinner wrote:
Wait, where is the HPy project in that plan? :-) The HPy project (brand new C API) is a good solution for the long term!
My concerns about HPy right now is that, in short, CPython has to continue supporting the C API for a few more years, and we cannot evolve CPython before it will become reasonable to consider removing the "legacy" C API.
I explained that in details in the PEP 674 (Disallow using Py_TYPE() and Py_SIZE() macros as l-values): https://www.python.org/dev/peps/pep-0674/#relationship-with-the-hpy-project
In parallel, we should continue promoting the usage of Cython, cffi, pybind11 and HPy, rather than using directly the C API.
Victor
On Jan 28, 2022, at 09:00, Steve Dower <steve.dower@python.org> wrote:
Does HPy have any clear guidance or assistance for their users to keep it up to date?
I'm concerned that if we simply substitute "support the C API for everyone" with "support the C API for every version of HPy" we're no better off.
Will it ever make sense to pull HPy into the CPython repo so that they evolve together? I can see advantages and disadvantages. If there’s a point in the future where we can just start promoting HPy as an official alternative C API, then it will likely get more traction over time. The disadvantage is that HPy would evolve at the same annual pace as CPython. -Barry
I think we will get *one* chance in the next decade to get it right. Whether that's HPy or evolution of the C API I'm not sure. Victor, am I right that the (some) stable ABI will remain important because projects don't have resources to build wheels for every Python release? If a project does R releases per year for P platforms that need to support V versions of Python, they would normally have to build R * P * V wheels. With a stable ABI, they could reduce that to R * P. That's the key point, right? Can HPy do that? On Fri, Jan 28, 2022 at 9:19 AM Barry Warsaw <barry@python.org> wrote:
On Jan 28, 2022, at 09:00, Steve Dower <steve.dower@python.org> wrote:
Does HPy have any clear guidance or assistance for their users to keep
it up to date?
I'm concerned that if we simply substitute "support the C API for
everyone" with "support the C API for every version of HPy" we're no better off.
Will it ever make sense to pull HPy into the CPython repo so that they evolve together? I can see advantages and disadvantages. If there’s a point in the future where we can just start promoting HPy as an official alternative C API, then it will likely get more traction over time. The disadvantage is that HPy would evolve at the same annual pace as CPython.
-Barry
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ABFHYMUH... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
If a project does R releases per year for P platforms that need to support V versions of Python, they would normally have to build R * P * V wheels. With a stable ABI, they could reduce that to R * P. That's the key point, right?
Can HPy do that?
actually, it can do even better than that. When you compile an HPy extension you can choose which ABI to target: - CPython ABI: in this modality, all HPy_* calls are statically translated (using static inline functions) into the corresponding Py_* call, and it generates modules like foo.cpython-38-x86_64-linux-gnu.so, undistinguishable from a "normal" module - HPy universal ABI: in this modality, it generates something like foo.hpy-x86_64-linux-gnu.so: all API calls are done through the HPyContext (which is basically a giant vtable): this module can be loaded by any implementation which supports the HPy universal ABI, including CPython, PyPy and GraalPython. The main drawback of the universal ABI is that it's slightly slower because it goes through the vtable indirection for every call, in particular HPy_Dup/HPy_Close which are mapped to Py_INCREF/Py_DECREF. Some early benchmark indicate a 5-10% slowdown. We haven't benchmarked it against the stable ABI though. Of course, in order to be fully usable, the HPy universal ABI will need special support by PyPI/pip/etc, because at the moment it is impossible to package it inside a wheel, AFAIK. ciao, Antonio
On 1/28/2022 6:17 PM, Antonio Cuni wrote:
Of course, in order to be fully usable, the HPy universal ABI will need special support by PyPI/pip/etc, because at the moment it is impossible to package it inside a wheel, AFAIK.
It's totally possible, it's just that none of the existing tools will automatically generate the tags you need. (These are most critical in the filename itself, and also appear in 1-2 bits of metadata that currently are unused AFAIK.) Basically, instead of just "cp310" (or "abi3", etc.), you'll want to use dots to separate each supported version ("cp38.cp39.cp310"). That will match the wheel to any of those versions. You can even do the same with OS platforms if you prefer fewer/bigger wheels over more platform-specific ones. Python on all platforms since IIRC 3.6 (maybe 3.5?) also have version and platform-specific tags in extension modules. These do not support combining tags as in wheels (and unfortunately do not match wheel tags at all), but do allow you to have version/platform-specific .pyd/.dylib/.so files in a single wheel. Again, it's just that none of the current build backends will help you do it. Cheers, Steve
On Fri, Jan 28, 2022 at 6:28 PM Guido van Rossum <guido@python.org> wrote:
I think we will get *one* chance in the next decade to get it right. Whether that's HPy or evolution of the C API I'm not sure.
Would you mind to elaborate? Which risk do you expect from switching to HPy and from fixing the C API (introducing incompatible C API changes)? For me, promoting HPy and evolution of the C API are complementary, can and must done in parallel for me. As I explained in PEP 674, while HPy does help C extensions writers, it doesn't solve any problem for CPython right now. CPython is still blocked by implementation details leaked throught the C API that we must still maintain for a few more years.
Victor, am I right that the (some) stable ABI will remain important because projects don't have resources to build wheels for every Python release? If a project does R releases per year for P platforms that need to support V versions of Python, they would normally have to build R * P * V wheels. With a stable ABI, they could reduce that to R * P. That's the key point, right?
There are different use cases. 1) First, my main worry is that we put a high pressure on maintainers of most important Python dependencies before the next of a new Python version, because we want them to handle the flow of incompatible C API changes before the final Python 3.x versions is released, to get them available when Python 3.x final is released. It annoys core developers who cannot change things in Python without getting an increasing number of complains about a large number of broken packages, sometimes with a request to revert. It annoys C extensions maintainers who have to care about Python alpha and beta releases which are not convenient to use (ex: not available in Linux distributions). Moreover, it became common to ask multiple changes and multiple releases before a Python final release, since more incompatible changes are introduced in Python (before the beta1). 2) Second, as you said, the stable ABI reduces the number of binary packages which have to be built. Small projects with a little team (ex: a single person) don't have resources to set up a CI and maintain it to build all these packages. It's doable, but it isn't free. -- The irony of the situation is that we must break the C API (hiding structures is technically an incompatible change)... to make the C API stable. Breaking it now to make it stable later. We already broke the C API many times in the past. The difference here is that changes are done in the purpose of bending it towards the limited C API and the stable ABI. My expectation is that replacing frame->f_code with PyFrame_GetCode() only has to be done exactly once: this API is not going this change. Sadly, the changes are not limited to frame->f_code, more changes are needed. For example, for PyFrameObject, accesses to every structure member must have to go through a function call (getter or setter function). Hopefully, only a small number of member are used by C extensions. The tricky part is to think about the high level API ("use cases") rather than just adding functions doing "return struct->member" and "struct->member = new_value". The PyThreadState_EnterTracing() and PyThreadState_LeaveTracing() functions added to Python 3.11 are a good example: the API is "generic" and the implementation changes 2 structure members, not a single one. In practice, what I did since Python 3.8 is to introduce a small number of C API changes per Python versions. We tried the "fix all the things at once" approach (!!!) with Python 3, and it... didn't go well. All C extensions had to suddenly write their own compatibility layer for a large number of C API functions (ex: replace PyInt_xxx with PyLong_xxx, without losing Python 2 support!). The changes that I'm introducing in the C API usually impact less than 100 extensions in total (usually, I would say between 10 and 25 per Python version, but it's hard to measure exactly).
Can HPy do that?
I wish more projects are incrementally rewritten with Cython, cffi, pybind11 and HPy, and so slowly move away using directly the C API. Yes, HPy support an "universal build" mode which allows to only build a C extension once, and use it on multiple *CPython* versions *and* (that's the big news!) multiple *PyPy* versions! I even heard that it also brings GraalPython support for free ;-) Victor -- Night gathers, and now my watch begins. It shall not end until my death.
I'm sorry, I was overwhelmed and didn't find the time until now to answer this. A lot was already said about this, so I'll just briefly explain below (inline). On Sat, Jan 29, 2022 at 2:38 AM Victor Stinner <vstinner@python.org> wrote:
On Fri, Jan 28, 2022 at 6:28 PM Guido van Rossum <guido@python.org> wrote:
I think we will get *one* chance in the next decade to get it right. Whether that's HPy or evolution of the C API I'm not sure.
Would you mind to elaborate? Which risk do you expect from switching to HPy and from fixing the C API (introducing incompatible C API changes)?
IMO users would benefit if we recommended one solution and started deprecating the rest. We currently have too many choices: Stable ABI, limited API (not everybody sees those two as the same thing), CPython API (C API), Cython (for many this is how they interact with the interpreter), HPy... And I think you have another thing in the works, a library that "backfills" (I think that's the word) APIs for older CPython versions so that users can pretend to use the latest C API but are able to compile/link for older versions. To me, that's too many choices -- at the very least it should be clearer how these relate to each other (e.g. the C API is a superset of the Limited API, the Stable ABI is based on the limited API (explain how), and HPy is a wrapper around the C API. (Or is it?) Such an explanation (of the relationships) would help users understand the consequences of choosing one or the other for their code -- how will future CPython versions affect them, how portable is their code to other Python implementations (PyPy, GraalPython, Jython). Users can't be expected to understand these consequences without a lot of help (honestly, many of these I couldn't explain myself :-( ).
For me, promoting HPy and evolution of the C API are complementary, can and must done in parallel for me. As I explained in PEP 674, while HPy does help C extensions writers, it doesn't solve any problem for CPython right now. CPython is still blocked by implementation details leaked throught the C API that we must still maintain for a few more years.
I understand the CPython is stuck supporting the de-facto standard C API for a long time. But unless we pick a "north star" (as people call it nowadays) of what we want to support in say 5-10 years, the situation will never improve. My point about "getting one chance to get it right in the next decade" is that we have to pick that north star, so we can tell users which horse to bet on. If the north star we pick is HPy, things will be clear. If it is evolving the C API things will also be clear. But I think we have to pick one, and stick to it so users (i.e., package maintainers/developers) have clarity. I understand that HPy is currently implemented on top of the C API, but hopefully it's not stuck on that. And it only helps a small group of extension writers -- those who don't need the functionality that HPy is still missing (they keep saying they're not ready for prime time) and who value portability to other Python implementations, and for whom the existing C API hacks in PyPy aren't sufficient. So it's mostly aspirational. But if it stays that for too long, it will just die for lack of motivation.
Victor, am I right that the (some) stable ABI will remain important because projects don't have resources to build wheels for every Python release? If a project does R releases per year for P platforms that need to support V versions of Python, they would normally have to build R * P * V wheels. With a stable ABI, they could reduce that to R * P. That's the key point, right?
There are different use cases.
1) First, my main worry is that we put a high pressure on maintainers of most important Python dependencies before the next of a new Python version, because we want them to handle the flow of incompatible C API changes before the final Python 3.x versions is released, to get them available when Python 3.x final is released.
Hm, maybe we should reduce the flow. And e.g. reject PEP 674...
It annoys core developers who cannot change things in Python without getting an increasing number of complains about a large number of broken packages, sometimes with a request to revert.
You are mostly talking about yourself here, right? Since the revert requests were mostly aimed at you. :-)
It annoys C extensions maintainers who have to care about Python alpha and beta releases which are not convenient to use (ex: not available in Linux distributions).
I don't use Linux much, so I am not familiar with the inconvenience of Python alpha/beta releases being unavailable. I thought that the Linux philosophy was that you could always just build from source?
Moreover, it became common to ask multiple changes and multiple releases before a Python final release, since more incompatible changes are introduced in Python (before the beta1).
Sorry, your grammar confuses me. Who is asking whom to do what here? Is the complaint just that things change between alphas? Maybe we should just give up on alphas and instead do nightlies (fully automated)?
2) Second, as you said, the stable ABI reduces the number of binary packages which have to be built. Small projects with a little team (ex: a single person) don't have resources to set up a CI and maintain it to build all these packages. It's doable, but it isn't free.
Maybe we need to help there. For example IIRC conda-forge will build conda packages -- maybe we should offer a service like that for wheels?
--
The irony of the situation is that we must break the C API (hiding structures is technically an incompatible change)... to make the C API stable. Breaking it now to make it stable later.
The question is whether that will ever be enough. Unless we manage to get rid of the INCREF/DECREF macros completely (from the public C API anyway) we still can't change object layout.
We already broke the C API many times in the past. The difference here is that changes are done in the purpose of bending it towards the limited C API and the stable ABI.
My expectation is that replacing frame->f_code with PyFrame_GetCode() only has to be done exactly once: this API is not going this change. Sadly, the changes are not limited to frame->f_code, more changes are needed. For example, for PyFrameObject, accesses to every structure member must have to go through a function call (getter or setter function). Hopefully, only a small number of member are used by C extensions.
Is this worth it? Maybe we should just declare those structs and APIs *unstable* and tell people who use them that they can expect to be broken by each alpha release. As you say, hopefully this doesn't affect most people. Likely it'll affect Cython dramatically but Cython is such a special case that trying to evolve the C API will never satisfy them. We'll have to deal with it separately. (Debuggers are a more serious concern. We may need to provide higher-level APIs for debuggers to do the things they need to do. Mark's PEP 669 should help here.)
The tricky part is to think about the high level API ("use cases") rather than just adding functions doing "return struct->member" and "struct->member = new_value". The PyThreadState_EnterTracing() and PyThreadState_LeaveTracing() functions added to Python 3.11 are a good example: the API is "generic" and the implementation changes 2 structure members, not a single one.
Right.
In practice, what I did since Python 3.8 is to introduce a small number of C API changes per Python versions. We tried the "fix all the things at once" approach (!!!) with Python 3, and it... didn't go well. All C extensions had to suddenly write their own compatibility layer for a large number of C API functions (ex: replace PyInt_xxx with PyLong_xxx, without losing Python 2 support!). The changes that I'm introducing in the C API usually impact less than 100 extensions in total (usually, I would say between 10 and 25 per Python version, but it's hard to measure exactly).
Ho *do* you count this? Try to compile the top 5000 PyPI packages? That might severely undercount a long tail of proprietary extensions.
Can HPy do that?
I wish more projects are incrementally rewritten with Cython, cffi, pybind11 and HPy, and so slowly move away using directly the C API.
Yes, HPy support an "universal build" mode which allows to only build a C extension once, and use it on multiple *CPython* versions *and* (that's the big news!) multiple *PyPy* versions! I even heard that it also brings GraalPython support for free ;-)
Victor -- Night gathers, and now my watch begins. It shall not end until my death.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Hi Guido, My "north star", as you say, is the HPy "design" (not the actual HPy API). I would like to convert PyObject* to opaque handles: dereferencing a PyObject* pointer would simply fail with a compiler error. I'm working bottom-to-top: prepare PyObject and PyVarObject to become opaque, *and* top-to-bottom: prepare subclasses (structures "inheriting" from PyObject and PyVarObject) to become opaque like PyFrameObject. IMO if PyObject* becomes a handle, the migration to the HPy API should be much easier. So far, a large part of the work has been identifying which APIs are preventing to move the current C API to opaque handles. The second part has been fixing these APIs one by one, starting with changes which don't break the C API. In the last 5 years, I fixed many issues of the C API. Very few projects were impacted since I avoided incompatible changes on purpose. Now I reached the end of my (current) queue with the most controversial changes: incompatible C API changes. I wrote PEP 670 and 674 to share my overall rationale and communicate on the reasons why I consider that these changes will make our life easier next years. I'm well aware that in the short term, the benefits are very limited. But we cannot jump immediately to opaque PyObject* handles. This work must be done incrementally. On Thu, Feb 3, 2022 at 1:40 AM Guido van Rossum <guido@python.org> wrote:
1) First, my main worry is that we put a high pressure on maintainers of most important Python dependencies before the next of a new Python version, because we want them to handle the flow of incompatible C API changes before the final Python 3.x versions is released, to get them available when Python 3.x final is released.
Hm, maybe we should reduce the flow. And e.g. reject PEP 674...
We need to find a the right limit when introducing C API to not break too many C extensions "per Python release". Yes, PEP 674 is an example of incompatible C API change, but Python 3.11 already got many others which don't are unrelated to that PEP. For example, the optimization work done by your Microsoft team broke a bunch of projects using PyThreadState and PyFrameObject APIs. See the related What's New in Python 3.11 entries: https://docs.python.org/dev/whatsnew/3.11.html#id2 (I didn't keep track of which projects are affected by these changes.) I would like to ensure that it remains possible to optimize Python, but for that, we need to reduce the friction related to C API changes. What's the best approach for that remains an open question. I'm proposing some solutions, we are discussing advantages and drawbacks ;-)
It annoys core developers who cannot change things in Python without getting an increasing number of complains about a large number of broken packages, sometimes with a request to revert.
You are mostly talking about yourself here, right? Since the revert requests were mostly aimed at you. :-)
Latest requests for revert are not about C API changes that I made. Cython and the C API exception change: https://mail.python.org/archives/list/python-dev@python.org/thread/RS2C53LDZ... There are other requests for revert in Python 3.11 related to Python changes, not to the C API. So far, I only had to revert 2 changes about my C API work: * PyType_HasFeature(): the change caused a performance regression on macOS, sadly Python cannot be built with LTO. With LTO (all platforms but macOS), my change doesn't affect performances. * Py_TYPE() / Py_SIZE() change: I reverted my change to have more time to prepare affected projects. Two years later, I consider that this preparation work is now done (all affected projects are ready) and so I submitted the PEP 674. Affected projects (merged and pending fixes): https://www.python.org/dev/peps/pep-0674/#backwards-compatibility
Moreover, it became common to ask multiple changes and multiple releases before a Python final release, since more incompatible changes are introduced in Python (before the beta1).
Sorry, your grammar confuses me. Who is asking whom to do what here?
Cython is the best example. During the development cycle of a new Python version, my Fedora team adapts Cython to the next Python and then request a release to get an official release supporting an alpha Python version. A release is not strictly required by us (we can apply downsteam patches), but it's more convenient for us and for people who cannot use our work-in-progress "COPR" (repository of RPM packages specific to Fedora). When we ask a project (using Cython) to merge our pull request, maintainers want to test it on the current Python alpha version, and it's not convenient when Cython is unusable. Between Python 3.x alpha1 and Python 3.x final, there might be multiple Cython releases to handle each time a bunch of C API incompatible changes. On Python and Cython sides, so far, there was no coordination to group incompatible changes, they land between alpha1 and beta1 in an irregular fashion.
2) Second, as you said, the stable ABI reduces the number of binary packages which have to be built. Small projects with a little team (ex: a single person) don't have resources to set up a CI and maintain it to build all these packages. It's doable, but it isn't free.
Maybe we need to help there. For example IIRC conda-forge will build conda packages -- maybe we should offer a service like that for wheels?
Tooling to automate the creation of wheel binary packages targeting the stable ABI would help. C extensions likely need changing a few function calls which are not supported by the limited C API. Someone has to do this work. My hope is that the quantify of changes is small (ex: modify 2 or 3 function calls) for small C extensions. I didn't try in practice yet.
The irony of the situation is that we must break the C API (hiding structures is technically an incompatible change)... to make the C API stable. Breaking it now to make it stable later.
The question is whether that will ever be enough. Unless we manage to get rid of the INCREF/DECREF macros completely (from the public C API anyway) we still can't change object layout.
I'm open to ideas if someone has a better plan ;-) Keeping reference counting for consumers of the C API (C extensions) doesn't prevent to avoid reference counting inside Python (switch to a completely different GC implementation). PyPy cpyext is a concrete example of that. The nogil fork keeps Py_INCREF()/Py_DECREF() functions, but changes their implementation. If we have to (ex: if we merge nogil), we can convert Py_INCREF() / Py_DECREF() static inline functions to regular functions tomorrow (and then change their implementation) without breaking the API. Adding abstractions (getter and setter functions) on PyObject give more freedom to consider different options to evolve Python tomorrow.
Is this worth it? Maybe we should just declare those structs and APIs *unstable* and tell people who use them that they can expect to be broken by each alpha release. As you say, hopefully this doesn't affect most people. Likely it'll affect Cython dramatically but Cython is such a special case that trying to evolve the C API will never satisfy them. We'll have to deal with it separately. (Debuggers are a more serious concern. We may need to provide higher-level APIs for debuggers to do the things they need to do. Mark's PEP 669 should help here.)
IMO we only need to add 5 to 10 functions to cover most use cases involving PyThreadState and PyFrameObject. The remaining least common usages can continue to require changes at each Python releases. The short term goal is only to reduce the number of required changes per Python releases. Yes, I'm talking about Cython, debuggers and profilers. Another example is that Cython currently calls PyCode_New() to create a fake frame object with a filename and line number. IMO it's the wrong abstraction level: Python should provide a function to create a frame with a filename and line number, so the caller doesn't have to bother about the complex PyCode_New() API and frequent PyCodeObject changes. (Correct me if this problem has already been solved in Python.)
In practice, what I did since Python 3.8 is to introduce a small number of C API changes per Python versions. (...)
How *do* you count this? Try to compile the top 5000 PyPI packages?
I'm using code search in the source code of top 5000 PyPI packages and I'm looking at broken packages in Fedora when we update Python. Also, sometimes people add a comment on an issue to mention that their project is broken by a change.
That might severely undercount a long tail of proprietary extensions.
Right. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Thu, Feb 3, 2022 at 9:27 AM Victor Stinner <vstinner@python.org> wrote:
Hi Guido,
[SNIP]
On Thu, Feb 3, 2022 at 1:40 AM Guido van Rossum <guido@python.org> wrote:
[SNIP]
Maybe we need to help there. For example IIRC conda-forge will build
conda packages -- maybe we should offer a service like that for wheels?
Tooling to automate the creation of wheel binary packages targeting the stable ABI would help. C extensions likely need changing a few function calls which are not supported by the limited C API. Someone has to do this work.
The idea of having a service to build wheels for folks is an old one that I think a ton of people would benefit from. Currently, people typically get pointed to https://pypi.org/project/cibuildwheel/ as the per-project solution. But designing a safe way to build wheels from any sdist on PyPI, keeping such a service up, having the free processing to do it, etc. is unfortunately a big enough project that no one has stepped forward to try and tackle it.
On 2/3/2022 12:15 PM, Victor Stinner wrote:
I'm working bottom-to-top: prepare PyObject and PyVarObject to become opaque, *and* top-to-bottom: prepare subclasses (structures "inheriting" from PyObject and PyVarObject) to become opaque like PyFrameObject.
IMO if PyObject* becomes a handle, the migration to the HPy API should be much easier.
It seems to me that moving PyObject* to be a handle leaves you in a place very similar to HPy. So why not just focus on making HPy suitable for developing C extensions, leave the existing C API alone, and eventually abandon the existing C API? Eric
On Fri, Feb 4, 2022 at 12:55 AM Eric V. Smith <eric@trueblade.com> wrote:
It seems to me that moving PyObject* to be a handle leaves you in a place very similar to HPy. So why not just focus on making HPy suitable for developing C extensions, leave the existing C API alone, and eventually abandon the existing C API?
I agree (but I'm biased :)), but I think there is also an important point which is easy to miss/overlook: it is not enough to declare that now you have handles instead of refcounting, you also need a way to enforce/check that the handles are used correctly. CPython might declare that object references are now handles and that each handle must be closed individually: this would work formally, but as long as handles are internally implemented on top of refcounting, things like closing the same handle twice would just continue to work if by chance the total refcount is still correct. This means that we will have extensions which will be formally incorrect but will work well on CPython, and horribly break as soon as you try to load them on e.g. PyPy. That's the biggest selling point of the HPy debug mode: in debug mode, HPy actively check that handles are closed properly, and it warns you if you close a handle twice or forget to close a handle, even on CPython. ciao, Antonio
On Thu, Feb 3, 2022 at 3:53 PM Eric V. Smith <eric@trueblade.com> wrote:
On 2/3/2022 12:15 PM, Victor Stinner wrote:
I'm working bottom-to-top: prepare PyObject and PyVarObject to become opaque, *and* top-to-bottom: prepare subclasses (structures "inheriting" from PyObject and PyVarObject) to become opaque like PyFrameObject.
IMO if PyObject* becomes a handle, the migration to the HPy API should be much easier.
It seems to me that moving PyObject* to be a handle leaves you in a place very similar to HPy. So why not just focus on making HPy suitable for developing C extensions, leave the existing C API alone, and eventually abandon the existing C API?
I think that's a possibility. I think it's a question for the team here whether that's the long-term goal that we want. If so we can make all of our work head towards that and help out HPy out as best we can.
On Fri, Feb 4, 2022 at 12:52 AM Eric V. Smith <eric@trueblade.com> wrote:
On 2/3/2022 12:15 PM, Victor Stinner wrote:
IMO if PyObject* becomes a handle, the migration to the HPy API should be much easier.
It seems to me that moving PyObject* to be a handle leaves you in a place very similar to HPy. So why not just focus on making HPy suitable for developing C extensions, leave the existing C API alone, and eventually abandon the existing C API?
I tried to explain the reasons why HPy doesn't solve all problems in the PEP 674: https://www.python.org/dev/peps/pep-0674/#the-c-api-is-here-is-stay-for-a-fe... One problem is to provide a better C API to users: HPy is great for that! Another problem is the inability to evolve Python because the C API leaks implementation details: HPy doesn't solve this problem because Python must continue supporting the C API for a few more years. My approach is to (slowly) bend the C API towards HPy design/API to ease the migration to HPy *and* (slowly) allow changing more Python internals (without affecting the public C API). Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Trying to cut things short, there's one thing I'd like to correct: On Thu, Feb 3, 2022 at 9:15 AM Victor Stinner <vstinner@python.org> wrote:
[...]
Another example is that Cython currently calls PyCode_New() to create a fake frame object with a filename and line number. IMO it's the wrong abstraction level: Python should provide a function to create a frame with a filename and line number, so the caller doesn't have to bother about the complex PyCode_New() API and frequent PyCodeObject changes. (Correct me if this problem has already been solved in Python.)
That was solved quite a while ago, with the PyCode_NewEmpty() API. Sadly Cython doesn't call it (or at least not always), because it takes a C string which is turned into a unicode object, and Cython already has the unicode object in hand. I don't want to proliferate APIs. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Wed, Feb 2, 2022 at 4:46 PM Guido van Rossum <guido@python.org> wrote: A few notes in this:
Maybe we need to help there. For example IIRC conda-forge will build conda packages -- maybe we should offer a service like that for wheels?
Yes, conda-forge used a complex CI system to build binaries conda packages for a variety of Python versions. And it does have some support for development versions. ONce a "feedstock" is developed, it's remarkably painless to get all the binaries up and available. I imagine someone could borrow a bunch of that code to make a system to build wheels. In fact, ther is the MAcPYthon org: https://github.com/MacPython Which began as a place to share building scripts for Mac binaries, but has expanded to build wheels for multiple platforms for the scipy stack. I don't know how it works these days -- I am no longer involved since I discovered conda, but they seem to have some nice stuff there -- perhaps it could be leveraged for more projects. However: one of the challenges for building C extensions is that they often depend on external C libs -- and that is exactly the problem that conda was built to address. So in a sense, a conda-forge-like auto-build system is inherently easier for conda packages than binary wheels. Which doesn't mean it couldn't be done -- just that the challenge of third party libs would need to be addressed. In any case, someone would have to do the work, as usual. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 03. 02. 22 1:40, Guido van Rossum wrote: [...]
I understand the CPython is stuck supporting the de-facto standard C API for a long time. But unless we pick a "north star" (as people call it nowadays) of what we want to support in say 5-10 years, the situation will never improve.
My point about "getting one chance to get it right in the next decade" is that we have to pick that north star, so we can tell users which horse to bet on. If the north star we pick is HPy, things will be clear. If it is evolving the C API things will also be clear. But I think we have to pick one, and stick to it so users (i.e., package maintainers/developers) have clarity.
A few months later, here's a draft of a “north star” document. Is this close to what you had in mind? https://docs.google.com/document/d/1lrvx-ujHOCuiuqH71L1-nBQFHreI8jsXC966AFu9... Please comment (here or there) as appropriate :)
On 4/4/2022 11:19 AM, Petr Viktorin wrote:
On 03. 02. 22 1:40, Guido van Rossum wrote: [...]
I understand the CPython is stuck supporting the de-facto standard C API for a long time. But unless we pick a "north star" (as people call it nowadays) of what we want to support in say 5-10 years, the situation will never improve.
My point about "getting one chance to get it right in the next decade" is that we have to pick that north star, so we can tell users which horse to bet on. If the north star we pick is HPy, things will be clear. If it is evolving the C API things will also be clear. But I think we have to pick one, and stick to it so users (i.e., package maintainers/developers) have clarity.
A few months later, here's a draft of a “north star” document. Is this close to what you had in mind?
https://docs.google.com/document/d/1lrvx-ujHOCuiuqH71L1-nBQFHreI8jsXC966AFu9...
"[There is a proposal for an additional “unstable” ring for even deeper integration -- compilers/debuggers. I'm listing it here even though it's not status quo :)] Private API -- internal use only (with specific exceptions)" The Private API is the unstable ring, subject to change each bug-fix release. The proposal is for a semi-stable ring, stable for each version. I agree that 'Semi-public' would be a good name. "(with specific exceptions)" seems like a rough edge. Would most (all?) exceptions go in the Semi-public ring? It seems to me that a big driver for this discussion is the current push to get Python-level results much faster, which most people agree is a good thing. That requires eliminating duplication, doing some thing faster, and somethings not at all. That in turn requires revising internal structures along with code paths. And that in turn requires revision of external code deeply integrated with the core. I think it worth making it clear that the resulting pain is needed for something that benefits most Python users. -- Terry Jan Reedy
IMO it would be better to keep the HPy design as the long term goal: * Refer to Python objects with opaque handles * All structures are opaque (with a few exceptions, like PyType_Spec) It will likely take multiple iterations (Python releases) to reach this goal, and incompatible C API changes may need a PEP (like PEP 674), but IMO it's good to keep this goal in mind. Otherwise, it's not easy to understand the rationale for changes like https://peps.python.org/pep-0674/ "PEP 674 – Disallow using macros as l-values". Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Hi, (I briefly commented also on the doc regarding this) I’m probably misinterpreting the exact goals. I read “stable ABI for everyone” and I’m thinking “what needs to happen to stay binary compatible and working for a couple of decades at least”. If that were the goal, I think the ideas around HPy’s handles and it’s usage of a context to get access to function pointers are most important, since that will ensure that new APIs can be added and old ones removed without breaking ABI. (This is very similar to how e.g. libraries like SDL deal with the problem that distributors want to update the SDL library and have it still work with 10 year old proprietary games). In particular, even things like slots in types need to be opaque. An optimizing runtime may want to use varying layouts for both types and objects - allowing direct access into any runtime structures prevents that. Exposed structures should not be used for any runtime objects, only as specs for construction of those runtime objects. Best, Tim From: Victor Stinner<mailto:vstinner@python.org> Sent: Tuesday, April 5, 2022 10:54 PM To: Petr Viktorin<mailto:encukou@gmail.com> Cc: Python Dev<mailto:python-dev@python.org> Subject: [External] : [Python-Dev] Re: Slowly bend the C API towards the limited API to get a stable ABI for everyone IMO it would be better to keep the HPy design as the long term goal: * Refer to Python objects with opaque handles * All structures are opaque (with a few exceptions, like PyType_Spec) It will likely take multiple iterations (Python releases) to reach this goal, and incompatible C API changes may need a PEP (like PEP 674), but IMO it's good to keep this goal in mind. Otherwise, it's not easy to understand the rationale for changes like https://urldefense.com/v3/__https://peps.python.org/pep-0674/__;!!ACWV5N9M2RV99hQ!eClvPBBrluADcw7qiB1sfyVPEAyTXoolnPkf_c9MLV-1Ns5roXVSrOKLBqaRmzmIsF1a$<https://urldefense.com/v3/__https:/peps.python.org/pep-0674/__;!!ACWV5N9M2RV99hQ!eClvPBBrluADcw7qiB1sfyVPEAyTXoolnPkf_c9MLV-1Ns5roXVSrOKLBqaRmzmIsF1a$> "PEP 674 – Disallow using macros as l-values". Victor -- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://urldefense.com/v3/__https://mail.python.org/mailman3/lists/python-dev.python.org/__;!!ACWV5N9M2RV99hQ!eClvPBBrluADcw7qiB1sfyVPEAyTXoolnPkf_c9MLV-1Ns5roXVSrOKLBqaRm7Y1wy1v$<https://urldefense.com/v3/__https:/mail.python.org/mailman3/lists/python-dev.python.org/__;!!ACWV5N9M2RV99hQ!eClvPBBrluADcw7qiB1sfyVPEAyTXoolnPkf_c9MLV-1Ns5roXVSrOKLBqaRm7Y1wy1v$> Message archived at https://urldefense.com/v3/__https://mail.python.org/archives/list/python-dev@python.org/message/3FYHB74CF6XBADFRLUVFV6NUZKXRSBSY/__;!!ACWV5N9M2RV99hQ!eClvPBBrluADcw7qiB1sfyVPEAyTXoolnPkf_c9MLV-1Ns5roXVSrOKLBqaRm9YsYkeW$<https://urldefense.com/v3/__https:/mail.python.org/archives/list/python-dev@python.org/message/3FYHB74CF6XBADFRLUVFV6NUZKXRSBSY/__;!!ACWV5N9M2RV99hQ!eClvPBBrluADcw7qiB1sfyVPEAyTXoolnPkf_c9MLV-1Ns5roXVSrOKLBqaRm9YsYkeW$> Code of Conduct: https://urldefense.com/v3/__http://python.org/psf/codeofconduct/__;!!ACWV5N9M2RV99hQ!eClvPBBrluADcw7qiB1sfyVPEAyTXoolnPkf_c9MLV-1Ns5roXVSrOKLBqaRm9jKqd_2$<https://urldefense.com/v3/__http:/python.org/psf/codeofconduct/__;!!ACWV5N9M2RV99hQ!eClvPBBrluADcw7qiB1sfyVPEAyTXoolnPkf_c9MLV-1Ns5roXVSrOKLBqaRm9jKqd_2$>
On 06. 04. 22 9:23, Tim Felgentreff wrote:
Hi,
(I briefly commented also on the doc regarding this)
I’m probably misinterpreting the exact goals. I read “stable ABI for everyone” and I’m thinking “what needs to happen to stay binary compatible and working for a couple of decades at least”.
Well, the doc is more down to the ground than this thread can be :)
If that were the goal, I think the ideas around HPy’s handles and it’s usage of a context to get access to function pointers are most important, since that will ensure that new APIs can be added and old ones removed without breaking ABI. (This is very similar to how e.g. libraries like SDL deal with the problem that distributors want to update the SDL library and have it still work with 10 year old proprietary games).
In particular, even things like slots in types need to be opaque. An optimizing runtime may want to use varying layouts for both types and objects - allowing direct access into any runtime structures prevents that. Exposed structures should not be used for any runtime objects, only as specs for construction of those runtime objects.
With Python's current general API backwards compatibility policy (PEP 387), it wouldn't make much sense to make ABI guarantees that strong. In reality you're likely to get DeprecationWarnings and runtime exceptions, and will need to recompile extensions well before decades+-scale ABI incompatibilities hit you. IIUC, adopting HPy would break API: adding a context argument to all functions would would be so massive a change, it'd be easier to just call that HPy rather than Python C-API. And the doc I shared is for the C-API. (Perhaps CPython can even move to HPy and implement C-API as a veener on top -- but I don't think the original C-API can go away in our lifetimes.)
On Tue, 5 Apr 2022 22:54:00 +0200 Victor Stinner <vstinner@python.org> wrote:
IMO it would be better to keep the HPy design as the long term goal:
* Refer to Python objects with opaque handles * All structures are opaque (with a few exceptions, like PyType_Spec)
If the HPy design is the long term goal, why not just recommend that people use HPy? And keep the C API for expert users with specific needs that are not accomodated by HPy. To me, it seems that trying to change the C API to be "like HPy" is creating a lot of work, churn and pain for little gain. (and, yes, perhaps HPy needs to be funded or supported by the PSF if it doesn't advance fast enough) Regards Antoine.
It will likely take multiple iterations (Python releases) to reach this goal, and incompatible C API changes may need a PEP (like PEP 674), but IMO it's good to keep this goal in mind.
Otherwise, it's not easy to understand the rationale for changes like https://peps.python.org/pep-0674/ "PEP 674 – Disallow using macros as l-values".
Victor
On Wed, Apr 20, 2022 at 10:03 AM Antoine Pitrou <antoine@python.org> wrote:
If the HPy design is the long term goal, why not just recommend that people use HPy? And keep the C API for expert users with specific needs that are not accomodated by HPy.
To me, it seems that trying to change the C API to be "like HPy" is creating a lot of work, churn and pain for little gain.
If you put HPy aside, "Fixing" the C API has multiple advantages for CPython and its (C API) users. For consumers of the C API (C extensions, Cython, pybind11, etc.), once most implementation details will be hidden, the C API will become way more stable. The "C API > Porting to Python x.y" section of What's New in Python x.y should become way shorter. Or at least, the number of impacted C extensions should be way smaller. Sadly, fixing the C API to hide implementation details requires to adapt (modify) C extensions. Even if usually only a few lines should be changed, and the pythoncapi-compat project now automates most of these changes. For CPython, no longer leaking implementation details allow to change "any implementation detail" without getting a heavy and annoying backfire from grumpy users should be very comfortable. In Python 3.9, 3.10 and 3.11 development cycles, we got backfire multiple times, and each time, it was really unpleasant both for CPython core devs and for C extensions maintainers (both have legit concerns and use cases). While these changes should ease the migration to HPy, it's not my goal. HPy requires to add a "ctx" parameter, it's a different API (there are multiple subtle differences).
(and, yes, perhaps HPy needs to be funded or supported by the PSF if it doesn't advance fast enough)
What can be done in practice for that? If I understood correctly, Oracle is sponsoring the project since they want to use HPy for GraalPython (of their GraalVM). Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Wed, 20 Apr 2022 12:52:53 +0200 Victor Stinner <vstinner@python.org> wrote:
On Wed, Apr 20, 2022 at 10:03 AM Antoine Pitrou <antoine@python.org> wrote:
If the HPy design is the long term goal, why not just recommend that people use HPy? And keep the C API for expert users with specific needs that are not accomodated by HPy.
To me, it seems that trying to change the C API to be "like HPy" is creating a lot of work, churn and pain for little gain.
If you put HPy aside, "Fixing" the C API has multiple advantages for CPython and its (C API) users.
With the caveat that the "fixing" probably requires users to fix their packages as well.
For consumers of the C API (C extensions, Cython, pybind11, etc.), once most implementation details will be hidden, the C API will become way more stable.
The *API* is quite stable already if you don't use the private/internal functions. Perhaps you're thinking about the ABI?
(and, yes, perhaps HPy needs to be funded or supported by the PSF if it doesn't advance fast enough)
What can be done in practice for that? If I understood correctly, Oracle is sponsoring the project since they want to use HPy for GraalPython (of their GraalVM).
Also, Anaconda recently hired Antonio Cuni, hopefully giving him sufficient time to work on HPy. So perhaps nothing needs to be done in practice. Regards Antoine.
On Wed, Apr 20, 2022 at 1:44 PM Antoine Pitrou <antoine@python.org> wrote:
For consumers of the C API (C extensions, Cython, pybind11, etc.), once most implementation details will be hidden, the C API will become way more stable.
The *API* is quite stable already if you don't use the private/internal functions. Perhaps you're thinking about the ABI?
In Fedora, we update Python early during Python alpha versions, and sadly it's common that many C extensions are incompatible (need to be modified) at each 3.x release. A single minor incompatible change is enough to require changing a C extension. I believe that once the C API will leak less implementation details, changing Python will impact less C extensions. HPy API looks more stable by design: it's way smaller and only expose the bare minimum. I took notes on (Python and C API) incompatible changes, impacting most Python projects and C extensions, from Python 3.7 to Python 3.11: https://github.com/vstinner/vstinner.github.io/blob/pelican/draft/python-inc... "C API > Porting to Python 3.11" section is quite long, PyFrameObject and PyThreadState structures changed a lot (PyFrameObject moved to the internal C API): https://docs.python.org/dev/whatsnew/3.11.html#id6 "C API > Porting to Python 3.10": https://docs.python.org/dev/whatsnew/3.10.html#id2 "C API > Porting to Python 3.9": https://docs.python.org/dev/whatsnew/3.9.html#id2 "Porting to Python 3.8 > Changes in C API": https://docs.python.org/dev/whatsnew/3.8.html#changes-in-the-c-api Victor -- Night gathers, and now my watch begins. It shall not end until my death.
6 months ago, I wrote a different document based on HPy Manifesto: "PEP: Taking the Python C API to the Next Level" https://mail.python.org/archives/list/python-dev@python.org/message/RA7Q4JAU... Victor On Mon, Apr 4, 2022 at 5:20 PM Petr Viktorin <encukou@gmail.com> wrote:
On 03. 02. 22 1:40, Guido van Rossum wrote: [...]
I understand the CPython is stuck supporting the de-facto standard C API for a long time. But unless we pick a "north star" (as people call it nowadays) of what we want to support in say 5-10 years, the situation will never improve.
My point about "getting one chance to get it right in the next decade" is that we have to pick that north star, so we can tell users which horse to bet on. If the north star we pick is HPy, things will be clear. If it is evolving the C API things will also be clear. But I think we have to pick one, and stick to it so users (i.e., package maintainers/developers) have clarity.
A few months later, here's a draft of a “north star” document. Is this close to what you had in mind?
https://docs.google.com/document/d/1lrvx-ujHOCuiuqH71L1-nBQFHreI8jsXC966AFu9...
Please comment (here or there) as appropriate :) _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DUWBMGLE... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
On 1/28/2022 5:15 PM, Barry Warsaw wrote:
On Jan 28, 2022, at 09:00, Steve Dower <steve.dower@python.org> wrote:
Does HPy have any clear guidance or assistance for their users to keep it up to date?
I'm concerned that if we simply substitute "support the C API for everyone" with "support the C API for every version of HPy" we're no better off.
Will it ever make sense to pull HPy into the CPython repo so that they evolve together? I can see advantages and disadvantages. If there’s a point in the future where we can just start promoting HPy as an official alternative C API, then it will likely get more traction over time. The disadvantage is that HPy would evolve at the same annual pace as CPython.
Possibly, but we'd have to be really careful to not actually *evolve* HPy. It would essentially be a new stable API, but ideally one that uses all the preprocessor tricks we can (and perhaps runtime tricks) to compile against any CPython version rather than just the one that it comes with. PSF "ownership" is probably enough to make it official (for those people who need everything to be "official"). I don't think that's necessary, but it does smooth the path for some people to be willing to use it. Cheers, Steve
Does HPy have any clear guidance or assistance for their users to keep it up to date?
not right now, because we are still somewhat in alpha mode and sometimes we redesign the API and/or break compatibility. But the plan is of course to stabilize at some point.
I think it can be done with clear communication from the HPy project (and us when we endorse it) that they will *never* break compatibility and it's *always* safe (and indeed, essential) for their users to use the latest version. But that's a big commitment that I can't sign them up for.
I think this will be doable once HPy is mature enough, and I also agree that any kind of official endorsement from CPython and/or PSF will help a lot the adoption of HPy itself. ciao, Antonio
On 28. 01. 22 16:04, Victor Stinner wrote:
Hi,
There is a reason why I'm bothering C extensions maintainers and Python core developers with my incompatible C API changes since Python 3.8. Let me share my plan with you :-)
In 2009 (Python 3.2), Martin v. Löwis did an amazing job with the PEP 384 "Defining a Stable ABI" to provide a "limited C API" and a "stable ABI" for C extensions: build an extension once, use it on multiple Python versions. Some projects like PyQt5 and cryptograpy use it, but it is just a drop in the PyPI ocean (353,084 projects). I'm trying to bend the "default" C API towards this "limited C API" to make it possible tomorrow to build *more* C extensions for the stable ABI.
My goal is that the stable ABI would be the default, and only a minority of C extensions would opt-out because they need to access to more functions for best performance.
The basic problem is that at the ABI level, C extensions must only call functions, rather than getting and setting directly to structure members. Structures changes frequently in Python (look at changes between Python 3.2 and Python 3.11), and any minor structure change breaks the ABI. The limited C API hides structures and only use function calls to solve this problem.
This is not true. The limited C API does include some structs that are not opaque, including some fields of PyObject. Your effort is not only bending the "regular" C API towards the limited API, but it's *also* bending the limited API towards a struct-less future. This will be a better future if we get there, but getting there has its downsides. One downside is that making incompatible changes to the limited API could make it very hard to support and test the stable ABI. For example, stable ABI extensions that do `obj->ob_type` must continue to work*, even if we make it impossible to do this in new extensions (by making PyObject opaque). Making PyObject opaque is possible (the limited API is not stable), but not easy to do correctly (e.g. remember to add tests for the newly "unreachable" parts of the stable ABI). (* we could also break the stable ABI, and we could even do it reasonably safely over a long period of time, but that's a whole different discussion.)
Since 2020, I'm modifying the C API, one function by one, to slowly hide implementations (prepare the API to make strutures opaque). I focused on the following structures:
* PyObject and PyVarObject (bpo-39573) * PyTypeObject (bpo-40170) * PyFrameObject (bpo-40421) * PyThreadState (bpo-39947)
The majority of C extensions use functions and macros, they don't access directly structure members. There are a few members which are sometimes accessed directly which prevents making these structures opaque. For example, some old C extensions use obj->ob_type rather than Py_TYPE(obj). Fixing the minority of C extensiisons should benefit to the majority which may become compatible with the stable ABI.
I am also converting macros to static inline functions to fix their API: define parameter types, result type and avoid surprising macros side effects ("macro pitfalls"). I wrote the PEP 670 "Convert macros to functions in the Python C API" for these changes.
I wrote the upgrade_pythoncapi.py tool in my pythoncapi_project (*) which modify C code to use Py_TYPE(), Py_SIZE() and Py_REFCNT() rather than accessing directly PyObject and PyVarObject members.
(*) https://github.com/pythoncapi/pythoncapi_compat
In this tool, I also added "Borrow" variant of functions like PyFrame_GetCode() which returns a strong reference, to replace frame->f_code with _PyFrame_GetCodeBorrow(). In Python 3.11, you cannot use the frame->f_code member anymore, since it has been removed! You must call PyFrame_GetCode() (or pythoncapi_compat _PyFrame_GetCodeBorrow() variant). > There are also a few macros which can be used as l-values like Py_TYPE(): "Py_TYPE(type1) = type2" must now be written "Py_SET_TYPE(type1, type2)" to avoid setting directly the tp_type type at the ABI level. I proposed the PEP 674 "Disallow using Py_TYPE() and Py_SIZE() macros as l-values" to solve these issues.
Currently, many "functions" are still implemented as macros or static inline functions, so C extensions still access structure members at the ABI level for best Python performance. Converting these to regular functions has an impact on performance and I would prefer to first write a PEP giving the rationale for that. >
Today, it is not possible yet to build numpy for the stable ABI. The gap is just too large for this big C extension. But step by step, the C API becomes closer to the limited API, and more and more code is ready to be built for the stable ABI.
Well, these C API changes have other advantages, like preparing Python for further optimizations, ease Python maintenance, clarify the seperation between the limited C API and the default C API, etc. ;-)
Victor -- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DN6JAK62... Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Jan 31, 2022 at 1:48 PM Petr Viktorin <encukou@gmail.com> wrote:
(* we could also break the stable ABI, and we could even do it reasonably safely over a long period of time, but that's a whole different discussion.)
IMO the stable ABI must be change in the long term, it still leaks too many implementation details. But right now, I didn't gather enough data about the problematic APIs and what must be changed exactly. I would prefer to only do once the work will be really blocked and there would be no other choice. Right now, I'm focused on fixing the *API*. It doesn't require to break the stable ABI. If we change the stable ABI, I would prefer to fix multiple issues at once. Examples: * No longer return borrowed references (ex: PyDict_GetItem is part of the stable ABI) and no longer steal references (ex: PyModule_AddObject) * Disallow getting direct access into an object data without a function to "release" the data. For example, PyBytes_AsString() gives a direct access into the string, but Python doesn't know when the C extension is done with it, and when it's safe to delete the object. Such API prevents to move Python objects in memory (implement a moving garbage collector in Python). * Disallow dereferencing a PyObject* pointer: most structures must be opaque. It indirectly means that accessing directly structure members must also be disallowed. PEP 670 and PEP 674 are partially fixing the issues. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On 31. 01. 22 15:40, Victor Stinner wrote:
On Mon, Jan 31, 2022 at 1:48 PM Petr Viktorin <encukou@gmail.com> wrote:
(* we could also break the stable ABI, and we could even do it reasonably safely over a long period of time, but that's a whole different discussion.)
IMO the stable ABI must be change in the long term, it still leaks too many implementation details. But right now, I didn't gather enough data about the problematic APIs and what must be changed exactly. I would prefer to only do once the work will be really blocked and there would be no other choice.
Right now, I'm focused on fixing the *API*. It doesn't require to break the stable ABI.
If we change the stable ABI, I would prefer to fix multiple issues at once. Examples:
* No longer return borrowed references (ex: PyDict_GetItem is part of the stable ABI) and no longer steal references (ex: PyModule_AddObject)
* Disallow getting direct access into an object data without a function to "release" the data. For example, PyBytes_AsString() gives a direct access into the string, but Python doesn't know when the C extension is done with it, and when it's safe to delete the object. Such API prevents to move Python objects in memory (implement a moving garbage collector in Python).
* Disallow dereferencing a PyObject* pointer: most structures must be opaque. It indirectly means that accessing directly structure members must also be disallowed. PEP 670 and PEP 674 are partially fixing the issues.
All of these can be changed in the API. Not easily -- I mentioned the problem with testing the ABI, but there might be others -- but fixing these in the API first is probably the way to go. The ABI can then be changed to align with the current API.
On Mon, Jan 31, 2022 at 4:03 PM Petr Viktorin <encukou@gmail.com> wrote:
If we change the stable ABI, I would prefer to fix multiple issues at once. Examples:
* No longer return borrowed references (ex: PyDict_GetItem is part of the stable ABI) and no longer steal references (ex: PyModule_AddObject)
* Disallow getting direct access into an object data without a function to "release" the data. For example, PyBytes_AsString() gives a direct access into the string, but Python doesn't know when the C extension is done with it, and when it's safe to delete the object. Such API prevents to move Python objects in memory (implement a moving garbage collector in Python).
* Disallow dereferencing a PyObject* pointer: most structures must be opaque. It indirectly means that accessing directly structure members must also be disallowed. PEP 670 and PEP 674 are partially fixing the issues.
(...) fixing these in the API first is probably the way to go.
That's what I already did in the past and what I plan to do in the future. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On 31. 01. 22 16:14, Victor Stinner wrote:
On Mon, Jan 31, 2022 at 4:03 PM Petr Viktorin <encukou@gmail.com> wrote:
If we change the stable ABI, I would prefer to fix multiple issues at once. Examples:
* No longer return borrowed references (ex: PyDict_GetItem is part of the stable ABI) and no longer steal references (ex: PyModule_AddObject)
* Disallow getting direct access into an object data without a function to "release" the data. For example, PyBytes_AsString() gives a direct access into the string, but Python doesn't know when the C extension is done with it, and when it's safe to delete the object. Such API prevents to move Python objects in memory (implement a moving garbage collector in Python).
* Disallow dereferencing a PyObject* pointer: most structures must be opaque. It indirectly means that accessing directly structure members must also be disallowed. PEP 670 and PEP 674 are partially fixing the issues.
(...) fixing these in the API first is probably the way to go.
That's what I already did in the past and what I plan to do in the future.
I see a problem in the subject of this thread: "Slowly bend the C API towards the limited API to get a stable ABI for everyone" is not a good summary of the proposed changes -- that is, bend *all* API (both the general public API and the limited API) to make most structs opaque, etc. If the summary doesn't match what's actually proposed, it's hard to discuss the proposal. Especially if the concrete plan changes often. Anyway, I propose a different plan than what I think you are proposing: - Add new "good" API where the current API is currently lacking. For examrle, PyModule_AddObjectRef is the "good" alternative to PyModule_AddObject -- it returns a strong reference. You've done a lot of great work here, and the API is much better for it. - "Soft"-deprecate the "bad" API: document that it's only there for existing working code. Why not remove it? The issue is that it *is* possible to use the existing API correctly, and many extension authors have spent a lot of time and effort to do just that. If we force them to use a new API that makes writing correct code easier, it won't actually make their job easier if they've already found the caveats and fixed them. - Remove the "bad" API from newer versions of the limited API. Extension authors can "opt in" to this new version, gaining new features of the limited API but losing the deprecated parts. (Here is the part where we should make sure the removals are well-documented, provide tools to "modernize" code, etc.) - Proactively work with popular/important projects (top PyPI packages, distro packages) to move to the latest API. The benefit for a CPython devs here is that we can see the pain points and unforeseen use cases, test any modernization tools, having a "reality check" on how invasive the changes actually are, and helping HPy & other implementations succeed even if they don't implement deprecated API. - Agree with HPy and other implementations of the limited API that it's not necessary for them to support the deprecated parts. - When (and only when) a deprecated API is actually harmful -- i.e. it blocks new improvements that benefit actual users in the short term -- it should be deprecated and removed. (Even better, if instead of removing it could be e.g. replaced by a function that's 3x slower, or leaks memory on exit, then it should.) Basically, instead of "We'll remove this API now because it prevents moving to a hypothetical moving garbage collector", it should be "Here is a moving garbage collector that speeds Python up by 30%, but to add it we need to remove these 30 deprecated APIs". The deprecation can be proactive, but not the removal.
On Mon, Feb 7, 2022 at 2:08 PM Petr Viktorin <encukou@gmail.com> wrote:
Basically, instead of "We'll remove this API now because it prevents moving to a hypothetical moving garbage collector", it should be "Here is a moving garbage collector that speeds Python up by 30%, but to add it we need to remove these 30 deprecated APIs". The deprecation can be proactive, but not the removal.
PEP 674 gives 3 concrete examples of issues already affecting the CPython nogil fork, HPy and GraalPython. They are not hypothetical. CPython is also affected by these issues, but the benefits of PEP 674 (alone) are too indirect, so I chose to avoid mentioning CPython issues directly, to avoid confusion. It's possible to workaround them: more or less copy/paste CPython inefficient code, as PyPy did years ago. The problem is that the workaround is inefficient and so PyPy cpyext remains slow. Well, HPy address the cpyext performance problem for PyPy and GraalPython ;-) I don't think that the question is if there is a real problem or not. The question is what's the best migration plan to move existing C extensions towards a better API which don't suffer from these problems. Once 95% of C extensions will use the limited C API, we would still not be able to change Python internals, because of the 5% remaining C extensions which are stuck at the legacy C API. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On 07. 02. 22 14:26, Victor Stinner wrote:
On Mon, Feb 7, 2022 at 2:08 PM Petr Viktorin <encukou@gmail.com> wrote:
Basically, instead of "We'll remove this API now because it prevents moving to a hypothetical moving garbage collector", it should be "Here is a moving garbage collector that speeds Python up by 30%, but to add it we need to remove these 30 deprecated APIs". The deprecation can be proactive, but not the removal.
PEP 674 gives 3 concrete examples of issues already affecting the CPython nogil fork, HPy and GraalPython. They are not hypothetical.
HPy and GraalPython are different projects, and their issues don't affect current users of CPython. It would be great to make an easy-to-use API that would be compatible with HPy and GraalPython. But not at the cost of breaking existing code. Are the advantages of moving to HPy or Graalpython worth porting the code? IMO that's a question each extension author should be able to ask. They shouldn't be forced to do the port by CPython. As for nogil, that's a promising experiment. If it turns out to be successful, let's remove the parts that are blocking it. Do we already know which parts those are?
CPython is also affected by these issues, but the benefits of PEP 674 (alone) are too indirect, so I chose to avoid mentioning CPython issues directly, to avoid confusion.
IMO, CPython issues are the ones most relevant to CPython. If yo're bringing them up, could you be more specific about them? If we don't discuss them, how do we know if there better ways to solve them than what you're proposing?
It's possible to workaround them: more or less copy/paste CPython inefficient code, as PyPy did years ago. The problem is that the workaround is inefficient and so PyPy cpyext remains slow. Well, HPy address the cpyext performance problem for PyPy and GraalPython ;-)
Hooray! So, the issue is addressed, and we don't need to break code that doesn't care about performance on PyPy! Sounds perfect! More seriously: I expect that there are a lot of extension authors that value API stability over performance on PyPy. They're doing nothing wrong and they don't deserve to be punished for using API they had available (and docs they had available) some years ago.
I don't think that the question is if there is a real problem or not. The question is what's the best migration plan to move existing C extensions towards a better API which don't suffer from these problems.
Once 95% of C extensions will use the limited C API, we would still not be able to change Python internals, because of the 5% remaining C extensions which are stuck at the legacy C API.
And don't forget extensions that use the "bad" parts of limited API, like using Py_TYPE as l-value in Python 3.10-. IMO, the required ABI changes will be more drastic than the API changes. When we want to change CPython internals, IMO we should know the exact set of APIs that'll need to be removed. Without an actual proposed change, I don't think we can know that. HPy is a good indicator of what API should be deprecated (i.e. where we should find a better way of doing things), but I don't think it gives good reasons for breaking changes.
On Mon, Feb 7, 2022 at 2:26 PM Victor Stinner <vstinner@python.org> wrote:
CPython is also affected by these issues, but the benefits of PEP 674 (alone) are too indirect, so I chose to avoid mentioning CPython issues directly, to avoid confusion.
A concrete example of problem caused by exposing structures in the C API (unrelated to PEP 674). It's a tricky problem... typedef struct { PyObject_VAR_HEAD Py_hash_t ob_shash; char ob_sval[1]; } PyBytesObject; The "char ob_sval[1];" syntax used to declare an array is an undefined behavior if the array is longer in memory. On a bytes object of 4 bytes, accessing ob_sval[3] works, but is an undefined behavior. => see https://bugs.python.org/issue40120 for details The problem can be solved by using "char ob_sval[];" syntax, but we cannot use this syntax in the public C header, since it causes compiler errors if the header is built with a C++ compiler (not to build Python itself, but build a C++ extension using the Python C API). Removing the structure from the public C API would solve the C++ issue. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On 07. 02. 22 15:29, Victor Stinner wrote:
On Mon, Feb 7, 2022 at 2:26 PM Victor Stinner <vstinner@python.org> wrote:
CPython is also affected by these issues, but the benefits of PEP 674 (alone) are too indirect, so I chose to avoid mentioning CPython issues directly, to avoid confusion.
A concrete example of problem caused by exposing structures in the C API (unrelated to PEP 674). It's a tricky problem...
typedef struct { PyObject_VAR_HEAD Py_hash_t ob_shash; char ob_sval[1]; } PyBytesObject;
The "char ob_sval[1];" syntax used to declare an array is an undefined behavior if the array is longer in memory. On a bytes object of 4 bytes, accessing ob_sval[3] works, but is an undefined behavior.
=> see https://bugs.python.org/issue40120 for details
The problem can be solved by using "char ob_sval[];" syntax, but we cannot use this syntax in the public C header, since it causes compiler errors if the header is built with a C++ compiler (not to build Python itself, but build a C++ extension using the Python C API). Removing the structure from the public C API would solve the C++ issue.
That sounds like a good candidate for deprecation, and adding proper getters if we didn't have them. But I don't think we actually need to remove the deprecated struct from the public C API.
On Mon, 7 Feb 2022 15:29:39 +0100 Victor Stinner <vstinner@python.org> wrote:
On Mon, Feb 7, 2022 at 2:26 PM Victor Stinner <vstinner@python.org> wrote:
CPython is also affected by these issues, but the benefits of PEP 674 (alone) are too indirect, so I chose to avoid mentioning CPython issues directly, to avoid confusion.
A concrete example of problem caused by exposing structures in the C API (unrelated to PEP 674). It's a tricky problem...
typedef struct { PyObject_VAR_HEAD Py_hash_t ob_shash; char ob_sval[1]; } PyBytesObject;
The "char ob_sval[1];" syntax used to declare an array is an undefined behavior if the array is longer in memory. On a bytes object of 4 bytes, accessing ob_sval[3] works, but is an undefined behavior.
=> see https://bugs.python.org/issue40120 for details
The problem can be solved by using "char ob_sval[];" syntax, but we cannot use this syntax in the public C header, since it causes compiler errors if the header is built with a C++ compiler (not to build Python itself, but build a C++ extension using the Python C API). Removing the structure from the public C API would solve the C++ issue.
You could also have something like: typedef struct { PyObject_VAR_HEAD Py_hash_t ob_shash; #ifdef __cpluscplus char ob_sval[1]; #else char ob_sval[]; #endif } PyBytesObject; Regards Antoine.
participants (12)
-
Antoine Pitrou
-
Antonio Cuni
-
Barry Warsaw
-
Brett Cannon
-
Christopher Barker
-
Eric V. Smith
-
Guido van Rossum
-
Petr Viktorin
-
Steve Dower
-
Terry Reedy
-
Tim Felgentreff
-
Victor Stinner