Explicit markers for special C-API situations (re: Clarification regarding Stable ABI and _Py_*)
(replying to https://mail.python.org/archives/list/python-dev@python.org/message/OJ65FPCJ...) On Wed, Dec 8, 2021 at 10:06 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
What about the various symbols listed in Misc/stable_abi.txt that were accidentally added to the limited API? Can we move toward dropping them from the stable ABI?
tl;dr We should consider making classifications related to the stable ABI harder to miss. <context> Knowing what is in the limited API is fairly straightforward. [1] However, it's clear that identifying what is part of the stable ABI, and why, is not so easy. Currently, we must rely on Misc/stable_abi.txt [2] (and the associated Tools/scripts/stable_abi.py). Documentation (C-API docs, PEPs, devguide) help too. Yet, there's a concrete disconnect here: the header files are by definition the authoritative single-source-of-truth for the C-API and it's too easy to forget about supplemental info in another file or document. This out-of-sight-out-of-mind situation is part of how we accidentally added things to the limited API for a while. [3] The stable ABI isn't the only area where we must identify different subsets of the C-API. However, in those other cases we use different structural/naming conventions to explicitly group things. Most importantly, each of those conventions makes the grouping unavoidable when reading the code. [4] For example: * closely related declarations go in the same header file (and then also exposed via Include/Python.h) * prefixes (e.g. Py_, PyDict_) provides similar grouping * an additional underscore prefix identifies "private" C-API * symbols are explicitly identified as part of the C-API via macros (PyAPI_FUNC, PyAPI_DATA) [5] * relatively recently, different directories correspond to different API layers (Include, Include/cpython, Include/internal) [3] </context> Could we take a similar explicit, coupled-to-the-code approach to identify when the different stable ABI situations apply? Here's the specific approach I had in mind, with macros similar to PyAPI_FUNC: * PyAPI_ABI_FUNC - in stable ABI when it wouldn't be normally (e.g. underscore prefix, in Include/internal) * PyAPI_ABI_INDIRECT - exposed in stable ABI due to a macro * PyAPI_ABI_ONLY - it only exists for ABI compatibility and isn't actually used any more * PyAPI_ABI_ACCIDENTAL - unintentionally added to limited API, probably not used there (...or perhaps use a PyABI_ prefix, though that's a bit easy to miss when reading.) As a reader I would find markers like this helpful in recognizing those special situations, as well as the constraints those situations impose on modification. At the least such macros would indicate something different is going on, and the macro name would be something I could look up if I needed more info. I expect others reading the code would get comparable value. I also expect tools like Tools/scripts/stable_abi.py would benefit. -eric [1] in Include/*.h and not #ifndef Py_LIMITED_API (sadly also making it easy to accidentally add things to the limited API, see [3]) [2] Before that you had to rely on comments or external documents or, in the worst case, work it out through careful study of the code, commit history, and mailing list archives. [3] The addition of Include/cpython and Include/internal helped us stop accidentally adding to the limited API. [4] It also makes the groupings deterministically discoverable by tools. [5] explicit use of "extern" indicates a different intent
I'll not get back to CPython until Tuesday, but I'll add a quick note for now. It's a bit blunt for lack of time; please don't be offended. If the code is the authoritative source of truth, we need a proper parser to extract the information. But we can't really use an existing parser (e.g. we need to navigate various #ifdef combinations), and writing a correct (=tested) custom C parser is pretty expensive. C declarations being "deterministically discoverable by tools" is a myth. I know you wrote a parser (kudos!), but unfortunately I don't trust it enough to let it define the API. Bugs in the parser could result in the API definition silently changing. That's why the info is in a separate version-controlled file, which must be explicitly modified. That file is the source of truth (or at least intent). There are also checks to ensure the code matches the manifest, so if you break things the CI should let you know. See the rationale in PEP 652: https://www.python.org/dev/peps/pep-0652/#rationale As for the types you mentioned: * PyAPI_ABI_INDIRECT, PyAPI_ABI_ONLY - these should get a comment. I don't think adding machine-readable metadata (and tooling for it) would be worth it, but I won't block it. * PyAPI_ABI_ACCIDENTAL - could be deprecated in the Limited API, and later removed from it, becoming "PyAPI_ABI_ONLY". On Thu, Dec 9, 2021 at 6:41 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
(replying to https://mail.python.org/archives/list/python-dev@python.org/message/OJ65FPCJ...)
On Wed, Dec 8, 2021 at 10:06 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
What about the various symbols listed in Misc/stable_abi.txt that were accidentally added to the limited API? Can we move toward dropping them from the stable ABI?
tl;dr We should consider making classifications related to the stable ABI harder to miss.
<context>
Knowing what is in the limited API is fairly straightforward. [1] However, it's clear that identifying what is part of the stable ABI, and why, is not so easy. Currently, we must rely on Misc/stable_abi.txt [2] (and the associated Tools/scripts/stable_abi.py). Documentation (C-API docs, PEPs, devguide) help too.
Yet, there's a concrete disconnect here: the header files are by definition the authoritative single-source-of-truth for the C-API and it's too easy to forget about supplemental info in another file or document. This out-of-sight-out-of-mind situation is part of how we accidentally added things to the limited API for a while. [3]
The stable ABI isn't the only area where we must identify different subsets of the C-API. However, in those other cases we use different structural/naming conventions to explicitly group things. Most importantly, each of those conventions makes the grouping unavoidable when reading the code. [4] For example:
* closely related declarations go in the same header file (and then also exposed via Include/Python.h) * prefixes (e.g. Py_, PyDict_) provides similar grouping * an additional underscore prefix identifies "private" C-API * symbols are explicitly identified as part of the C-API via macros (PyAPI_FUNC, PyAPI_DATA) [5] * relatively recently, different directories correspond to different API layers (Include, Include/cpython, Include/internal) [3]
</context>
Could we take a similar explicit, coupled-to-the-code approach to identify when the different stable ABI situations apply? Here's the specific approach I had in mind, with macros similar to PyAPI_FUNC:
* PyAPI_ABI_FUNC - in stable ABI when it wouldn't be normally (e.g. underscore prefix, in Include/internal) * PyAPI_ABI_INDIRECT - exposed in stable ABI due to a macro * PyAPI_ABI_ONLY - it only exists for ABI compatibility and isn't actually used any more * PyAPI_ABI_ACCIDENTAL - unintentionally added to limited API, probably not used there
(...or perhaps use a PyABI_ prefix, though that's a bit easy to miss when reading.)
As a reader I would find markers like this helpful in recognizing those special situations, as well as the constraints those situations impose on modification. At the least such macros would indicate something different is going on, and the macro name would be something I could look up if I needed more info. I expect others reading the code would get comparable value. I also expect tools like Tools/scripts/stable_abi.py would benefit.
-eric
[1] in Include/*.h and not #ifndef Py_LIMITED_API (sadly also making it easy to accidentally add things to the limited API, see [3]) [2] Before that you had to rely on comments or external documents or, in the worst case, work it out through careful study of the code, commit history, and mailing list archives. [3] The addition of Include/cpython and Include/internal helped us stop accidentally adding to the limited API. [4] It also makes the groupings deterministically discoverable by tools. [5] explicit use of "extern" indicates a different intent
Maybe we could start by having the tool regenerate the file and verifying that it produces the same results? Then in the future we keep the file in the repo so changes to it can be tracked separately, but we run the tool as part of CI to make sure that its output still matches. This is what we do for other generated files like opcode.h, parser.c and so on. On Thu, Dec 9, 2021 at 10:31 AM Petr Viktorin <encukou@gmail.com> wrote:
I'll not get back to CPython until Tuesday, but I'll add a quick note for now. It's a bit blunt for lack of time; please don't be offended.
If the code is the authoritative source of truth, we need a proper parser to extract the information. But we can't really use an existing parser (e.g. we need to navigate various #ifdef combinations), and writing a correct (=tested) custom C parser is pretty expensive. C declarations being "deterministically discoverable by tools" is a myth. I know you wrote a parser (kudos!), but unfortunately I don't trust it enough to let it define the API. Bugs in the parser could result in the API definition silently changing.
That's why the info is in a separate version-controlled file, which must be explicitly modified. That file is the source of truth (or at least intent). There are also checks to ensure the code matches the manifest, so if you break things the CI should let you know. See the rationale in PEP 652: https://www.python.org/dev/peps/pep-0652/#rationale
As for the types you mentioned: * PyAPI_ABI_INDIRECT, PyAPI_ABI_ONLY - these should get a comment. I don't think adding machine-readable metadata (and tooling for it) would be worth it, but I won't block it. * PyAPI_ABI_ACCIDENTAL - could be deprecated in the Limited API, and later removed from it, becoming "PyAPI_ABI_ONLY".
On Thu, Dec 9, 2021 at 6:41 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
(replying to
https://mail.python.org/archives/list/python-dev@python.org/message/OJ65FPCJ... )
On Wed, Dec 8, 2021 at 10:06 AM Eric Snow <ericsnowcurrently@gmail.com>
wrote:
What about the various symbols listed in Misc/stable_abi.txt that were accidentally added to the limited API? Can we move toward dropping them from the stable ABI?
tl;dr We should consider making classifications related to the stable ABI harder to miss.
<context>
Knowing what is in the limited API is fairly straightforward. [1] However, it's clear that identifying what is part of the stable ABI, and why, is not so easy. Currently, we must rely on Misc/stable_abi.txt [2] (and the associated Tools/scripts/stable_abi.py). Documentation (C-API docs, PEPs, devguide) help too.
Yet, there's a concrete disconnect here: the header files are by definition the authoritative single-source-of-truth for the C-API and it's too easy to forget about supplemental info in another file or document. This out-of-sight-out-of-mind situation is part of how we accidentally added things to the limited API for a while. [3]
The stable ABI isn't the only area where we must identify different subsets of the C-API. However, in those other cases we use different structural/naming conventions to explicitly group things. Most importantly, each of those conventions makes the grouping unavoidable when reading the code. [4] For example:
* closely related declarations go in the same header file (and then also exposed via Include/Python.h) * prefixes (e.g. Py_, PyDict_) provides similar grouping * an additional underscore prefix identifies "private" C-API * symbols are explicitly identified as part of the C-API via macros (PyAPI_FUNC, PyAPI_DATA) [5] * relatively recently, different directories correspond to different API layers (Include, Include/cpython, Include/internal) [3]
</context>
Could we take a similar explicit, coupled-to-the-code approach to identify when the different stable ABI situations apply? Here's the specific approach I had in mind, with macros similar to PyAPI_FUNC:
* PyAPI_ABI_FUNC - in stable ABI when it wouldn't be normally (e.g. underscore prefix, in Include/internal) * PyAPI_ABI_INDIRECT - exposed in stable ABI due to a macro * PyAPI_ABI_ONLY - it only exists for ABI compatibility and isn't actually used any more * PyAPI_ABI_ACCIDENTAL - unintentionally added to limited API, probably not used there
(...or perhaps use a PyABI_ prefix, though that's a bit easy to miss when reading.)
As a reader I would find markers like this helpful in recognizing those special situations, as well as the constraints those situations impose on modification. At the least such macros would indicate something different is going on, and the macro name would be something I could look up if I needed more info. I expect others reading the code would get comparable value. I also expect tools like Tools/scripts/stable_abi.py would benefit.
-eric
[1] in Include/*.h and not #ifndef Py_LIMITED_API (sadly also making it easy to accidentally add things to the limited API, see [3]) [2] Before that you had to rely on comments or external documents or, in the worst case, work it out through careful study of the code, commit history, and mailing list archives. [3] The addition of Include/cpython and Include/internal helped us stop accidentally adding to the limited API. [4] It also makes the groupings deterministically discoverable by tools. [5] explicit use of "extern" indicates a different intent
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CANB7JOA... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On 09/12/2021 19.26, Petr Viktorin wrote:
I'll not get back to CPython until Tuesday, but I'll add a quick note for now. It's a bit blunt for lack of time; please don't be offended.
If the code is the authoritative source of truth, we need a proper parser to extract the information. But we can't really use an existing parser (e.g. we need to navigate various #ifdef combinations), and writing a correct (=tested) custom C parser is pretty expensive. C declarations being "deterministically discoverable by tools" is a myth. I know you wrote a parser (kudos!), but unfortunately I don't trust it enough to let it define the API. Bugs in the parser could result in the API definition silently changing.
There are other options than writing a new parser. GCC and Clang are flexible. For example GCC can be extended with plugins and custom attributes. We could extend the header files with custom attributes and then use a plugin to create an ABI file from the attributes. I created a quick n' hack https://github.com/python/cpython/compare/main...tiran:gcc-pythonapi-plugin?... as proof of concept. The plugin takes PyAPI_ABI_FUNC(PyObject *) PyLong_FromLong(long); and dumps the declaration as: extern struct PyObject * PyLong_FromLong (long int); "abi_func" Christian
Christian Heimes wrote:
On 09/12/2021 19.26, Petr Viktorin wrote:
If the code is the authoritative source of truth, we need a proper parser to extract the information. ... unfortunately I don't trust it enough to let it define the API. Bugs in the parser could result in the API definition silently changing.
There are other options than writing a new parser. GCC and Clang are flexible. For example GCC can be extended with plugins and custom attributes.
But they have the same problem ... it can be difficult to know if there is a subtle bug in someone's understanding of how the plugin interacts with, for example, nested ifndef. The failure mode for an explicitly manually maintained text file is that something doesn't get added when it should, and the more conservative API consumers wait an extra release before using it. -jJ We could extend the header files with custom attributes and
then use a plugin to create an ABI file from the attributes. I created a quick n' hack https://github.com/python/cpython/compare/main...tiran:gcc-pythonapi-plugin?... as proof of concept. The plugin takes PyAPI_ABI_FUNC(PyObject *) PyLong_FromLong(long); and dumps the declaration as: extern struct PyObject * PyLong_FromLong (long int); "abi_func" Christian
On 10/12/2021 03.08, Jim J. Jewett wrote:
Christian Heimes wrote:
On 09/12/2021 19.26, Petr Viktorin wrote:
If the code is the authoritative source of truth, we need a proper parser to extract the information. ... unfortunately I don't trust it enough to let it define the API. Bugs in the parser could result in the API definition silently changing.
There are other options than writing a new parser. GCC and Clang are flexible. For example GCC can be extended with plugins and custom attributes.
But they have the same problem ... it can be difficult to know if there is a subtle bug in someone's understanding of how the plugin interacts with, for example, nested ifndef.
The failure mode for an explicitly manually maintained text file is that something doesn't get added when it should, and the more conservative API consumers wait an extra release before using it.
Macros and ifndefs are not a problem. A GCC plugin for user-defined attributes hooks into the build process at a late stage. By the time the plugin hook is invoked, the precompiler has resolved all macros and ifdefs, and the C code has been parsed. The plugin operates on the same intermediate code as the compiler. The approach would allow us to make the headers the authoritative source for most API and ABI symbols. I don't think that we can use it for macros. We can even include additional metadata in the custom attribute, e.g. version added PyAPI_ABI_FUNC("3.2", PyObject *) PyLong_FromLong(long); We can convert Misc/stable_abi.txt into an auto-generated file. The file should still stay in git, so we can use it to verify the stable ABI in CI. Christian
On 10. 12. 21 11:55, Christian Heimes wrote:
On 10/12/2021 03.08, Jim J. Jewett wrote:
Christian Heimes wrote:
On 09/12/2021 19.26, Petr Viktorin wrote:
If the code is the authoritative source of truth, we need a proper parser to extract the information. ... unfortunately I don't trust it enough to let it define the API. Bugs in the parser could result in the API definition silently changing.
There are other options than writing a new parser. GCC and Clang are flexible. For example GCC can be extended with plugins and custom attributes.
But they have the same problem ... it can be difficult to know if there is a subtle bug in someone's understanding of how the plugin interacts with, for example, nested ifndef.
The failure mode for an explicitly manually maintained text file is that something doesn't get added when it should, and the more conservative API consumers wait an extra release before using it.
Macros and ifndefs are not a problem.
They are: we want to find PyErr_SetExcFromWindowsErr on all systems, and include it in the docs.
A GCC plugin for user-defined attributes hooks into the build process at a late stage. By the time the plugin hook is invoked, the precompiler has resolved all macros and ifdefs, and the C code has been parsed. The plugin operates on the same intermediate code as the compiler.
The approach would allow us to make the headers the authoritative source for most API and ABI symbols. I don't think that we can use it for macros. We can even include additional metadata in the custom attribute, e.g. version added
PyAPI_ABI_FUNC("3.2", PyObject *) PyLong_FromLong(long);
This looks a bit awkward already, and if/when we start including e.g. "version removed" for PyAPI_ABI_ONLY, it'll get worse.
We can convert Misc/stable_abi.txt into an auto-generated file. The file should still stay in git, so we can use it to verify the stable ABI in CI.
We can, but genuinely I think it works better as a source of truth than a generated artifact. Changes to it should be deliberate. I get that not everyone will agree with that. But it's also *much* easier to maintain the current "best-effort" checks (which can punt on a few edge cases) than add an all-encompassing, tested parser-based generator. Not everything needs to be automated :) Eric said:
The tooling is a secondary concern to my point. Mostly, I wish the declarations in the header files had the extra classifications, rather than having to remember to refer to a separate text file.
This part sounds like a good idea.
On Thu, Dec 9, 2021, 11:26 Petr Viktorin <encukou@gmail.com> wrote:
I'll not get back to CPython until Tuesday, but I'll add a quick note for now. It's a bit blunt for lack of time; please don't be offended.
Not at all. :) The tooling is a secondary concern to my point. Mostly, I wish the declarations in the header files had the extra classifications, rather than having to remember to refer to a separate text file.
-eric
participants (5)
-
Christian Heimes
-
Eric Snow
-
Guido van Rossum
-
Jim J. Jewett
-
Petr Viktorin