(Rather obviously, I'm not writing on behalf of the SC.)
The Steering Council's response to my PEP 689 (Unstable C API tier)
opened a subtle question that I'd like to get some discussion on:
What does it mean if an API has a leading underscore (i.e. the `_Py`
prefix)?
It seems that currently, the underscore serves as a “warning” -- its
users (and reviewers) should read the documentation when using such
functions. (This might be more nuanced; I'll leave any details to other
people so I won't misinterpret their views.)
I was surprised to learn that this is the status quo, and I don't think
it's a good one. So I took what I thought is the status quo, tightened
it, and I present it as a proposal:
## Proposal
Anything that starts with `_Py` is a fully internal, private CPython
implementation detail. As a user, you can use it in a debugging session
but don't rely on it existing (or being compatible) in any different
CPython release. That said:
- We won't break this internal API for no reason, so if you use it now
and have good tests you don't need stop using it right now.
- If you're using private API and don't see a public alternative, you
should contact CPython devs to
- see if we can add public API for the use case.
- let us know that someone's using the API, and we should be extra
careful with it.
With this rule, you can simply grep your codebase for `_Py` to see if
you're using API that can go away without warning.
I'm aware that there are currently many underscored names that are
intended to be used. Under this proposal, we should eventually add
non-underscores aliases for them to make their status clear. (That's a
desired end state; there's no rush to get 100% there.)
## Underscore as a warning
The alternative (underscore means “warning”, read the documentation) is,
IMO, unfortunate. The documentation for underscored functions is often
missing, both for functions you can use and ones you shouldn't.
Consider example questions like:
- `_PyCode_GetExtra` is mentioned in PEP 523, but not documented on
docs.python.org. As a user, can I use it?
- `_PyImport_AcquireLock` is mentioned in a StackOverflow answer, but
not on docs.python.org. As a user, can I use it?
- I find that (say) `_PyArg_UnpackStack` is no longer necessary in
CPython. As a core dev, where do I need to look to see if I can remove it?
The issues these point out can't be fixed easily: there are hundreds of
underscored functions exposed in the public headers, and no good way to
prevent adding new ones (of either kind -- consumable or private).
Some API is even only exposed for technical reasons: `_Py_NewRef` needs
to be exposed, even though we'd like users to never use it. (This
particular function is not too dangerous to use, but any macro or
`static inline` function that's an implementation detail has the same
issue.)
But there's no way for CPython to mark something in a public API header
as private. Something that's undocumented on purpose is
indistinguishable from something we forgot to document. There's no way
to check a codebase for using internal API, short of manually checking
the docs of each function.
On 14.06.2022 11:54, Petr Viktorin wrote:
> On 14. 06. 22 11:35, Marc-Andre Lemburg wrote:
>> On 14.06.2022 11:15, Petr Viktorin wrote:
>>> On 13. 06. 22 17:36, Marc-Andre Lemburg wrote:
>>>> In the past we always said: "_Py* is an internal API. Use at your own
>>>> risk.", which I guess is somewhere between the warning and the strict
>>>> "don't use" policy you are describing.
>>>>
>>>> The problem with the "don't use" policy is that in some cases, there
>>>> are no public APIs available to do certain things and so the extension
>>>> writers have to resort to the private ones to implement their logic.
>>>>
>>>> E.g. to implement a free list for Python objects, you have to use
>>>> _Py_NewReference() in order to create an object based on a memory
>>>> area taken from the free list. If you want to create a bytes objects
>>>> using overallocation, it's common to use _PyBytes_Resize() to resize
>>>> the buffer to the final size.
>>>>
>>>> What sometimes happens is that after a while the private APIs get their
>>>> leading underscore removed to then become public ones.
>>>
>>> It's not just about removing the underscore: when this happens the APIs should
>>> also get documentation, tests, and some expectation of stability (e.g. that we
>>> won't go randomly adding tstate parameters to them).
>>
>> Of course; all public APIs should ideally have this :-)
>>
>>>> This upwards
>>>> migration path would be made impossible with the "don't use" policy.
>>>
>>> Why not? I have no doubt people will use private API, no matter how explicitly
>>> we say that it can break at any time.
>>>
>>> My proposal is making this more explicit. And yes, it's also putting some more
>>> pressure on core devs to expose proper API for use cases people have, and on
>>> people to report their use cases.
>>
>> It would certainly be good to get more awareness for common uses of
>> currently private APIs, but I'm not sure whether the proposed
>> "don't use" policy would help with this.
>>
>> I have a feeling that the effect would go in a different direction:
>> with a strict "don't use" policy core devs would get a blanket
>> permission to change exposed _Py* APIs at will, without any consideration
>> about breaking possible use cases out there.
>>
>> IMO, both parties should be aware of the issues around using/changing
>> exposed APIs marked as private and ideally to the same extent.
>
> OK, but what are those issues? How do we get to the point where both parties
> agree? We don't even know who the other party is.
>
>> Perhaps it would be better to leave the current "use at your own risk"
>> approach in place and just add a new process for potentially having
>> private APIs made public.
>
> But if we say "use at your own risk", what's the risk? Isn't that the same as
> saying we don't support it at all and it might go away at any time?
Not really, since the bar for core devs is higher when making
changes to such internal APIs. A blanket "don't use" policy
would make it easy for core devs to argue that any change is
acceptable.
If we provide a process to document external use of currently
private APIs, both sides get to know each other and their use
cases.
The "risk" for the external developer is that the APIs may not
get approved as public APIs.
The process itself allows core devs to get a view into the use
of those APIs and take more care when making changes to those APIs
while the process of potentially making them public is still ongoing.
Overall, we'd get a better idea on what extension writers need,
while at the same it'd be clearer which of the private APIs
can indeed be changed without causing disruption.
If we say: please open a ticket when starting to use a private
API, I think we'd get to a better C API in the long run.
>> E.g. the above two cases are potentially candidates for such a
>> process. I have used both in code I have written, because, AFAIK,
>> there's no other way to implement the functionality otherwise.
>>
>> I'm pretty sure that fairly low level tools such as Cython will
>> have similar cases.
>
> Indeed. Documenting and testing _Py_NewReference and its caveats sounds like a
> good idea, and we might as well remove the underscore then.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Experts (#1, Jun 14 2022)
>>> Python Projects, Coaching and Support ... https://www.egenix.com/
>>> Python Product Development ... https://consulting.egenix.com/
________________________________________________________________________
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
https://www.egenix.com/company/contact/https://www.malemburg.com/
Hello,
When reviewing the PyType_FromMetaclass function proposed in [PR 93012],
it occurred to me that it would be good to add better support for
extending types with opaque structs.
Consider the [tutorial example] of extending `list`, adapted to PyType_Spec:
typedef struct {
PyListObject list;
int state;
} SubListObject;
static PyType_Spec Sublist_spec = {
.name = "sublist.SubList",
.basicsize = sizeof(SubListObject),
...
};
If the PyListObject struct is not available (or opaque), as in the
stable ABI, this approach won't work.
The practical issue in PR 93012 is with metaclasses, which extend
PyTypeObject or PyHeapTypeObject. Same idea but a bit “too meta” to be a
good example. “Binding generator”-type projects that use the limited API
resort to hacks in this area: see [PySide] or the experimental
[limited-API branch of nanobind].
I propose adding API that treats the subclass-specific data as a
separate struct, and the “base” as an opaque blob (manipulated either by
accessor functions or, if the struct is available, directly).
Concretely:
- PyType_Spec.basicsize can be a negative (or zero) N to request -N
bytes of *extra* storage on top of what the base class needs, so that
the PyType_Spec can be static & read-only.
- a new `void* PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)`
returns the pointer to data specific to `cls`.
- a new `Py_ssize_t PyObject_GetTypeataSize(PyTypeObject *cls)` returns
the size of that data (computed as cls->tp_basicsize -
cls->tp_base->tp_basicsize).
- itemsize would get similar treatment, but I'll leave details to the PEP
Intended usage:
typedef struct {
int state;
} SubListObject;
static PyType_Spec Sublist_spec = {
.name = "sublist.SubList",
.basicsize = -sizeof(SubListObject),
...
};
...
sublist_type = PyType_FromSpec(Sublist_spec);
...
SubListObject *data = PyObject_GetTypeData(instance, sublist_type);
data->state++;
What do you think? Is it a PEP-worthy idea?
[PR 93012]: https://github.com/python/cpython/pull/93012
[tutorial example]:
https://docs.python.org/3/extending/newtypes_tutorial.html#subclassing-othe…
[PySide]:
https://github.com/pyside/pyside2-setup/blob/5.11/sources/shiboken2/libshib…
[limited-API branch of nanobind]:
https://github.com/wjakob/nanobind/compare/master...limited_api#diff-1457ce…
The signal:noise ratio due to spam is getting a bit high on this mailing
list. Would the admins be up for turning on default moderation for people
and then selectively turning it off? Or should this group move to
discuss.python.org where spam issues aren't as much of a concern?
Hi,
I wrote a new script which adds Python 3.10 support to your C
extensions without losing Python 3.6 support:
https://github.com/pythoncapi/pythoncapi_compat
For example, it replaces "op->ob_type" with "Py_TYPE(op)" and replaces
"frame->f_back" with "_PyFrame_GetBackBorrow(frame)".
It relies on the pythoncapi_compat.h header file that I wrote to
implement Python 3.9 and Python 3.10 on old Python versions. Examples:
Py_NewRef() and PyThreadState_GetFrame().
The _PyFrame_GetBackBorrow() function doesn't exist in the Python C
API, it's only provided by pythoncapi_compat.h to ease the migration
of C extensions. I advise you to replace _PyFrame_GetBackBorrow()
(borrowed reference) with PyFrame_GetBack() (strong reference). The
PyFrame_GetBack() function was added to Python 3.9 and
pythoncapi_compat.h provides it on older Python versions.
This project is related to my PEP 620 "Hide implementation details
from the C API" which tries to make the C API more abstract to later
allow to implement new optimization in CPython and to make other
Python implementations like PyPy faster when running C extensions.
This project only targets extension modules written in C by using
directly the "Python.h" API. I advise you to use Cython or HPy to no
longer be bothered with incompatible C API changes at every Python
release ;-)
* https://cython.org/
* https://hpy.readthedocs.io/
I hope that my script will facilitate migration of C extensions to HPy.
Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
There's more background in the Discourse thread [1], but the key C API
design topic is that PyEval_GetLocals() and PyEval_GetGlobals()
currently behave differently when there's no Python frame active in
the current thread:
* PyEval_GetGlobals(): returns NULL without setting an exception
* PyEval_GetLocals(): returns NULL *and* sets an exception
PyEval_GetLocals() has been doing the latter since it gained the
ability to report memory allocation errors when dealing with optimised
frames back in Python 3.4.
The "no active Python frame" case isn't really an error (as it's a
valid state in applications that embed CPython), so I'm wondering if
PyEval_GetLocals() should partially go back to its pre-3.4 behaviour,
and not set an exception in the "no active Python frame" case.
Cheers,
Nick.
[1] https://discuss.python.org/t/subtle-c-api-discrepancy-pyeval-getlocals-vs-p…
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
EXPY is an express way to extend Python!
EXPY provides a way to extend python in an elegant way. For more information and a tutorial, see: http://expy.sourceforge.net/
What's new:
1. Correct treatment of __init__ method.
2. Give warnings of missing Py_INCREF on
appropriate special type methods.
3. Documentation update.
Cheers,
Yingjie
Hi,
TL;DR; Is it safe to Py_DecRef an object that was created before Py_Finalize
after Py_Initialize was called to restart Python runtime?
I am debugging 3.9+ support in https://github.com/pythonnet/pythonnet
The issue I originally got is validate_list from gcmodule.c fails at
assert(prev == GC_PREV(head));
In debugging this, I sprinkled my code with calls to validate_list, and found
that the assertion fails in the following scenario:
Py_Initialize();
...
PyObject* myObj = PyObject_Call(simple_class, empty_tuple, NULL);
Py_Finalize();
...
Py_Initialize();
...
validate_list(gc.generations[2], collecting_clear_unreachable_clear); // OK
Py_DecRef(myObj);
validate_list(gc.generations[2], collecting_clear_unreachable_clear); // EXPLOSION
Now I have not yet tried to narrow down the steps further, because I realised
that I am unsure about the question in TL;DR;. E.g. is this use of myObj
supported by embedding API in general, or is this expected, that after
reinitializing the runtime Py_DecRef on old objects would break GC?
In my scenario simple_class is defined in Python as inheriting from object,
and only has two attributes set at runtime. One has value
builtins.StopIteration, and the other one is an instance of traceback.
Regards,
Victor
Hi,
tl; dr the C API of PyFrameObject changed. If you have issues, please speak up!
I moved the PyFrameObject structure to the internal C API headers. You
should now use the public C API to access its members:
* ``f_back``: use :c:func:`PyFrame_GetBack`.
* ``f_blockstack``: removed.
* ``f_builtins``: use ``PyObject_GetAttrString((PyObject*)frame,
"f_builtins")``.
* ``f_code``: use :c:func:`PyFrame_GetCode`.
* ``f_gen``: removed.
* ``f_globals``: use ``PyObject_GetAttrString((PyObject*)frame, "f_globals")``.
* ``f_iblock``: removed.
* ``f_lasti``: use ``PyObject_GetAttrString((PyObject*)frame, "f_lasti")``.
Code using ``f_lasti`` with ``PyCode_Addr2Line()`` must use
:c:func:`PyFrame_GetLineNumber` instead.
* ``f_lineno``: use :c:func:`PyFrame_GetLineNumber`
* ``f_locals``: use ``PyObject_GetAttrString((PyObject*)frame, "f_locals")``.
* ``f_stackdepth``: removed.
* ``f_state``: no public API (renamed to ``f_frame.f_state``).
* ``f_trace``: no public API.
* ``f_trace_lines``: use ``PyObject_GetAttrString((PyObject*)frame,
"f_trace_lines")``
(it also be modified).
* ``f_trace_opcodes``: use ``PyObject_GetAttrString((PyObject*)frame,
"f_trace_opcodes")``
(it also be modified).
* ``f_localsplus``: no public API (renamed to ``f_frame.localsplus``).
* ``f_valuestack``: removed.
See What's New in Python 3.11 for more details (I completed the doc,
but it will only be updated online tomorrow, there is a cron task once
per day):
https://docs.python.org/dev/whatsnew/3.11.html#id2
You can use the pythoncapi_compat project to get PyFrame_GetBack() and
PyFrame_GetCode() on Python 3.8 and older:
https://pythoncapi-compat.readthedocs.io/
If there is no public C API to access the PyFrameObject members used
by your project, please speak up!
See for example my PR to use the public C API in coverage:
https://github.com/nedbat/coveragepy/pull/1331
There is an on-going work for 2 years to add new getter functions for
PyFrameObject:
https://bugs.python.org/issue40421
In Python 3.10, there was no rush to make the structure internal.
Things changed quickly in Python 3.11: a heavy rework on Python
internals is on-going to optimize Python (especially ceval.c and how
frames are created and used). For example, the
_PyEval_EvalFrameDefault() function now gets a new InterpreterFrame
structure, it no longer takes a PyFrameObject. Python 3.11 now creates
Python frame objects on demand.
Because of these changes, reading directly PyFrameObject.f_back is now
an error since the member is not always filled, but it wasn't a
compiler error. You must now call the public PyFrame_GetBack()
function rather than accessing directly to the member.
Since Python 2.3 (that's quite old ;-)), reading directly
PyFrameObject.f_lineno is unsafe since it depends if the Python code
is being traced or not: you must use PyFrame_GetLineNumber() instead.
See also the issue for the longer rationale:
https://bugs.python.org/issue46836
I know that these changes are inconvenient, but I'm here if you need
my help for updating your project! Moreover, Python 3.11 is scheduled
for October, so there are still a few months to adapt your project,
and maybe add new getter functions. The best would be to add all
required getter functions before Python 3.11 beta1 scheduled for the
beginning of May (2022-05-06).
Victor
--
Night gathers, and now my watch begins. It shall not end until my death.