Leading uderscore in C-API names (`_Py`): Private, or just a warning?
(Rather obviously, I'm not writing on behalf of the SC.)
The Steering Council's response to my PEP 689 (Unstable C API tier) opened a subtle question that I'd like to get some discussion on:
What does it mean if an API has a leading underscore (i.e. the _Py
prefix)?
It seems that currently, the underscore serves as a “warning” -- its users (and reviewers) should read the documentation when using such functions. (This might be more nuanced; I'll leave any details to other people so I won't misinterpret their views.)
I was surprised to learn that this is the status quo, and I don't think it's a good one. So I took what I thought is the status quo, tightened it, and I present it as a proposal:
## Proposal
Anything that starts with _Py
is a fully internal, private CPython
implementation detail. As a user, you can use it in a debugging session
but don't rely on it existing (or being compatible) in any different
CPython release. That said:
- We won't break this internal API for no reason, so if you use it now and have good tests you don't need stop using it right now.
- If you're using private API and don't see a public alternative, you
should contact CPython devs to
- see if we can add public API for the use case.
- let us know that someone's using the API, and we should be extra careful with it.
With this rule, you can simply grep your codebase for _Py
to see if
you're using API that can go away without warning.
I'm aware that there are currently many underscored names that are intended to be used. Under this proposal, we should eventually add non-underscores aliases for them to make their status clear. (That's a desired end state; there's no rush to get 100% there.)
## Underscore as a warning
The alternative (underscore means “warning”, read the documentation) is, IMO, unfortunate. The documentation for underscored functions is often missing, both for functions you can use and ones you shouldn't. Consider example questions like:
_PyCode_GetExtra
is mentioned in PEP 523, but not documented on docs.python.org. As a user, can I use it?_PyImport_AcquireLock
is mentioned in a StackOverflow answer, but not on docs.python.org. As a user, can I use it?- I find that (say)
_PyArg_UnpackStack
is no longer necessary in CPython. As a core dev, where do I need to look to see if I can remove it?
The issues these point out can't be fixed easily: there are hundreds of underscored functions exposed in the public headers, and no good way to prevent adding new ones (of either kind -- consumable or private).
Some API is even only exposed for technical reasons: _Py_NewRef
needs
to be exposed, even though we'd like users to never use it. (This
particular function is not too dangerous to use, but any macro or
static inline
function that's an implementation detail has the same
issue.)
But there's no way for CPython to mark something in a public API header
as private. Something that's undocumented on purpose is
indistinguishable from something we forgot to document. There's no way
to check a codebase for using internal API, short of manually checking
the docs of each function.
In the past we always said: "_Py* is an internal API. Use at your own risk.", which I guess is somewhere between the warning and the strict "don't use" policy you are describing.
The problem with the "don't use" policy is that in some cases, there are no public APIs available to do certain things and so the extension writers have to resort to the private ones to implement their logic.
E.g. to implement a free list for Python objects, you have to use _Py_NewReference() in order to create an object based on a memory area taken from the free list. If you want to create a bytes objects using overallocation, it's common to use _PyBytes_Resize() to resize the buffer to the final size.
What sometimes happens is that after a while the private APIs get their leading underscore removed to then become public ones. This upwards migration path would be made impossible with the "don't use" policy.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, Jun 13 2022)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
On 13. 06. 22 17:36, Marc-Andre Lemburg wrote:
In the past we always said: "_Py* is an internal API. Use at your own risk.", which I guess is somewhere between the warning and the strict "don't use" policy you are describing.
The problem with the "don't use" policy is that in some cases, there are no public APIs available to do certain things and so the extension writers have to resort to the private ones to implement their logic.
E.g. to implement a free list for Python objects, you have to use _Py_NewReference() in order to create an object based on a memory area taken from the free list. If you want to create a bytes objects using overallocation, it's common to use _PyBytes_Resize() to resize the buffer to the final size.
What sometimes happens is that after a while the private APIs get their leading underscore removed to then become public ones.
It's not just about removing the underscore: when this happens the APIs should also get documentation, tests, and some expectation of stability (e.g. that we won't go randomly adding tstate parameters to them).
This upwards migration path would be made impossible with the "don't use" policy.
Why not? I have no doubt people will use private API, no matter how explicitly we say that it can break at any time.
My proposal is making this more explicit. And yes, it's also putting some more pressure on core devs to expose proper API for use cases people have, and on people to report their use cases.
On 14.06.2022 11:15, Petr Viktorin wrote:
On 13. 06. 22 17:36, Marc-Andre Lemburg wrote:
In the past we always said: "_Py* is an internal API. Use at your own risk.", which I guess is somewhere between the warning and the strict "don't use" policy you are describing.
The problem with the "don't use" policy is that in some cases, there are no public APIs available to do certain things and so the extension writers have to resort to the private ones to implement their logic.
E.g. to implement a free list for Python objects, you have to use _Py_NewReference() in order to create an object based on a memory area taken from the free list. If you want to create a bytes objects using overallocation, it's common to use _PyBytes_Resize() to resize the buffer to the final size.
What sometimes happens is that after a while the private APIs get their leading underscore removed to then become public ones.
It's not just about removing the underscore: when this happens the APIs should also get documentation, tests, and some expectation of stability (e.g. that we won't go randomly adding tstate parameters to them).
Of course; all public APIs should ideally have this :-)
This upwards migration path would be made impossible with the "don't use" policy.
Why not? I have no doubt people will use private API, no matter how explicitly we say that it can break at any time.
My proposal is making this more explicit. And yes, it's also putting some more pressure on core devs to expose proper API for use cases people have, and on people to report their use cases.
It would certainly be good to get more awareness for common uses of currently private APIs, but I'm not sure whether the proposed "don't use" policy would help with this.
I have a feeling that the effect would go in a different direction: with a strict "don't use" policy core devs would get a blanket permission to change exposed _Py* APIs at will, without any consideration about breaking possible use cases out there.
IMO, both parties should be aware of the issues around using/changing exposed APIs marked as private and ideally to the same extent.
Perhaps it would be better to leave the current "use at your own risk" approach in place and just add a new process for potentially having private APIs made public.
E.g. the above two cases are potentially candidates for such a process. I have used both in code I have written, because, AFAIK, there's no other way to implement the functionality otherwise.
I'm pretty sure that fairly low level tools such as Cython will have similar cases.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, Jun 14 2022)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
On Mon, Jun 13, 2022 at 5:36 PM Marc-Andre Lemburg <mal@egenix.com> wrote:
In the past we always said: "_Py* is an internal API. Use at your own risk.", which I guess is somewhere between the warning and the strict "don't use" policy you are describing.
In the last year, I tried to go further: make sure that it's no longer technically possible to use internal functions (the ones that I modified).
I made two changes:
Move API from the public API to the internal API: if you really want to access to internal API, you know have to (1) define Py_BUILD_CORE (2) use a different header file (with "internal/" in its path).
Replace PyAPI_FUNC() with "extern", so it's technically no longer possible to use modified functions outside Python itself. This change cannot be done if an internal API is used by a stdlib module built as a shared library.
The problem with the "don't use" policy is that in some cases, there are no public APIs available to do certain things and so the extension writers have to resort to the private ones to implement their logic.
My goal is to reduce the size of the C API and clarify what's public and what's internal.
That's how I noticed that _PyFloat_Pack8() was used and I added a tested and documented public API for that instead in Python 3.11 and I added it to pythoncapi-compat for older Python versions (the implementation just uses the private _PyFloat functions).
For me, anything with _Py and _PY prefix falls into the "internal" bag.
Reducing the size of the C API is important to still be able to change Python internals. Right now, it's stressful for a core dev to modify Python, since it's still unclear what's public or not, and estimate if a change is going to break a project or not (hint: any change will always break one project somewhere ;-)).
E.g. to implement a free list for Python objects, you have to use _Py_NewReference() in order to create an object based on a memory area taken from the free list. If you want to create a bytes objects using overallocation, it's common to use _PyBytes_Resize() to resize the buffer to the final size.
I want to remove _Py_NewReference() but "sadly" it's used in 3rd party extensions, so for now, I kept it :-) https://github.com/python/cpython/issues/85161
I would be nice to have a "supported" (maybe private/internal?) API for that.
Victor
Night gathers, and now my watch begins. It shall not end until my death.
On 16.06.2022 17:16, Victor Stinner wrote:
On Mon, Jun 13, 2022 at 5:36 PM Marc-Andre Lemburg <mal@egenix.com> wrote:
In the past we always said: "_Py* is an internal API. Use at your own risk.", which I guess is somewhere between the warning and the strict "don't use" policy you are describing.
In the last year, I tried to go further: make sure that it's no longer technically possible to use internal functions (the ones that I modified).
I made two changes:
Move API from the public API to the internal API: if you really want to access to internal API, you know have to (1) define Py_BUILD_CORE (2) use a different header file (with "internal/" in its path).
Replace PyAPI_FUNC() with "extern", so it's technically no longer possible to use modified functions outside Python itself. This change cannot be done if an internal API is used by a stdlib module built as a shared library.
That's too draconian for my taste.
With a proper reporting process in place, we'll get communication going between extension writers and core devs and without alienating the extension writers, or making assumptions which don't hold in practice.
I have a feeling that such communication is not really working out. I keep monitoring changes to the APIs and raise concerns where needed in areas where I know they are going to cause problems, but that's only one set of eyes. We need plenty more.
This will benefit both core devs and extension writers, since only a healthy and reasonably complete Python C API will help sustain the popularity and attractiveness of Python in areas which are far away from core development - e.g. most of the PyData or scientific space.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, Jun 17 2022)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
On Fri, Jun 17, 2022 at 6:16 PM Marc-Andre Lemburg <mal@egenix.com> wrote:
With a proper reporting process in place, we'll get communication going between extension writers and core devs and without alienating the extension writers, or making assumptions which don't hold in practice.
There are issues, capi and python-dev mailing lists, and https://discuss.python.org/ It's not enough?
IMO What's New in Python now better documents incompatible C API changes than previously. We are paying more attention to documenting these changes.
I have a feeling that such communication is not really working out.
Would you mind to be more specific?
I keep monitoring changes to the APIs and raise concerns where needed in areas where I know they are going to cause problems, but that's only one set of eyes. We need plenty more.
There are more and more people running code search in advance, and paying a close attention to projects testing the "next Python" (like Python 3.12 today). My feeling is like more people are proactive to research for issues and reports them. But it seems like you have a different experience.
This will benefit both core devs and extension writers, since only a healthy and reasonably complete Python C API will help sustain the popularity and attractiveness of Python in areas which are far away from core development - e.g. most of the PyData or scientific space.
I'm not sure what do you propose in practice?
Victor
Night gathers, and now my watch begins. It shall not end until my death.
On 20.06.2022 17:37, Victor Stinner wrote:
[...] This will benefit both core devs and extension writers, since only a healthy and reasonably complete Python C API will help sustain the popularity and attractiveness of Python in areas which are far away from core development - e.g. most of the PyData or scientific space.
I'm not sure what do you propose in practice?
The idea is that instead of declaring a "don't use" policy, as Petr suggested, we keep the "use at your own risk" policy and add a note suggesting to open a ticket "Please create a public version of the _Py_XYZ API" (because of missing functionality in the public API), whenever making use of a private API.
That way we get in touch with people who need internal APIs exposed in the public API and can hash out the details more easily.
The code search approach, several core devs are using, is helpful as well, but the direct interaction on a ticket gives you better feedback on the reasons why internal APIs were used.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, Jun 20 2022)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
On 20. 06. 22 17:48, Marc-Andre Lemburg wrote:
On 20.06.2022 17:37, Victor Stinner wrote:
[...] This will benefit both core devs and extension writers, since only a healthy and reasonably complete Python C API will help sustain the popularity and attractiveness of Python in areas which are far away from core development - e.g. most of the PyData or scientific space.
I'm not sure what do you propose in practice?
The idea is that instead of declaring a "don't use" policy, as Petr suggested, we keep the "use at your own risk" policy and add a note suggesting to open a ticket "Please create a public version of the _Py_XYZ API" (because of missing functionality in the public API), whenever making use of a private API.
OK, I see. “Use at your own risk” is a better wording than “don't use”, since it better implies that we'll try not to break things for no reason.
So my proposal would now be:
*Anything* with a leading underscore is unsupported, private, “use at your own risk”. If it is actually supported, that's a CPython bug to fix — see below. If you're using such private API and don't see a public alternative, you should contact CPython devs to:
- see if we can add public API for the use case.
- let us know that someone's using the API, and we should be extra careful with it.
When something is supported (in any way), it should lose the leading underscore*, get tested so we don't break it, and get documented so the intended semantics (vs. accidental implementation details) is clear.
- (The old underscored name might be kept as an alias to avoid breaking code, which is technically an exception to “Anything with a leading underscore is entirely unsupported”)
That way we get in touch with people who need internal APIs exposed in the public API and can hash out the details more easily.
The code search approach, several core devs are using, is helpful as well, but the direct interaction on a ticket gives you better feedback on the reasons why internal APIs were used.
+1
On 20.06.2022 18:48, Petr Viktorin wrote:
On 20. 06. 22 17:48, Marc-Andre Lemburg wrote:
On 20.06.2022 17:37, Victor Stinner wrote:
[...] This will benefit both core devs and extension writers, since only a healthy and reasonably complete Python C API will help sustain the popularity and attractiveness of Python in areas which are far away from core development - e.g. most of the PyData or scientific space.
I'm not sure what do you propose in practice?
The idea is that instead of declaring a "don't use" policy, as Petr suggested, we keep the "use at your own risk" policy and add a note suggesting to open a ticket "Please create a public version of the _Py_XYZ API" (because of missing functionality in the public API), whenever making use of a private API.
OK, I see. “Use at your own risk” is a better wording than “don't use”, since it better implies that we'll try not to break things for no reason.
So my proposal would now be:
*Anything* with a leading underscore is unsupported, private, “use at your own risk”. If it is actually supported, that's a CPython bug to fix — see below. If you're using such private API and don't see a public alternative, you should contact CPython devs to: - see if we can add public API for the use case. - let us know that someone's using the API, and we should be extra careful with it.
When something is supported (in any way), it should lose the leading underscore*, get tested so we don't break it, and get documented so the intended semantics (vs. accidental implementation details) is clear.
- (The old underscored name might be kept as an alias to avoid breaking code, which is technically an exception to “Anything with a leading underscore is entirely unsupported”)
Sounds good :-)
That way we get in touch with people who need internal APIs exposed in the public API and can hash out the details more easily.
The code search approach, several core devs are using, is helpful as well, but the direct interaction on a ticket gives you better feedback on the reasons why internal APIs were used.
+1
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, Jun 20 2022)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
On Mon, Jun 13, 2022 at 4:18 PM Petr Viktorin <encukou@gmail.com> wrote:
- We won't break this internal API for no reason, so if you use it now and have good tests you don't need stop using it right now.
I disagree with that. We don't provide any backward compatibiltiy warranty for reasons. They are bugfixes which require changing internal API: add parameter, change the behavior, remove a function, etc. In the past, all bug reports about changed internal API used by 3rd party modules was rejected.
If you want to start providing backward compatibility warranty, you should put it in your "unstable API" bag or create a documented and test public API instead.
_PyCode_GetExtra
is mentioned in PEP 523, but not documented on docs.python.org. As a user, can I use it?
I would say: no, don't use it :-) Or in general, you can use a private function, but in that case, *you are on your own*, don't expect support from Python core developers, nor backward compatibility warranties.
_PyImport_AcquireLock
is mentioned in a StackOverflow answer, but not on docs.python.org. As a user, can I use it?
Nope :-) Don't use it.
If there are reasons to call this function, it should become a public function instead.
But importlib changed a lot last years, I'm not sure that it's still needed.
- I find that (say)
_PyArg_UnpackStack
is no longer necessary in CPython. As a core dev, where do I need to look to see if I can remove it?
This function should be move to the internal C API. I already tried, but it requires to modify Argument Clinic to add an include "pycore_modsupport.h". Modifying Argument Clinic is not easy, so I gave up on that one for now.
The issues these point out can't be fixed easily: there are hundreds of underscored functions exposed in the public headers, and no good way to prevent adding new ones (of either kind -- consumable or private).
We can add a tool check list names in the public C API, allow existing names prefixed by "_Py" but disallow add new ones (unless you explicitly allow these new names).
Some API is even only exposed for technical reasons:
_Py_NewRef
needs to be exposed, even though we'd like users to never use it. (This particular function is not too dangerous to use, but any macro orstatic inline
function that's an implementation detail has the same issue.)
This one is only there for performance. We should not bother about micro-optimization and just remove the static inline implementation and always use the regular function call instead.
Victor
Night gathers, and now my watch begins. It shall not end until my death.
participants (3)
-
Marc-Andre Lemburg
-
Petr Viktorin
-
Victor Stinner