(PEP 620) C API for efficient loop iterating on a sequence of PyObject** or other C types
Le mar. 23 juin 2020 à 03:47, Neil Schemenauer
One final comment: I think even if we manage to cleanup the API and make it friendly for other Python implementations, there is going to be a fair amount of overhead. If you look at other "managed runtimes" that just seems unavoidable (e.g. Java, CLR, V8, etc). You want to design the API so that you maximize the amount of useful work done with each API call. Using something like PyList_GET_ITEM() to iterate over a list is not a good pattern. So keep in mind that an extension API is going to have some overhead.
A large part of the PEP 620 is already implemented: https://www.python.org/dev/peps/pep-0620/#summary So far, I didn't notice any major performance issue, but you're right that the PEP itself can only make performance as good or worse, but not better. The PEP only prevents to specialize code in third party code, CPython continues to have full access to all internals like structures. CPython internals already use specialized code. For example, _PyTuple_ITEMS() gives access to PyObject** which is denied for 3rd party code in the PEP. The question is for extensions like numpy which do rely on internals to emit faster code. -- In HPy, the question was asked as well. If I recall correctly, instead of making assumptions about object layouts depending on its type, new protocols should be added to query access to an object in a specific way. For example, we can consider continuing to provide raw access to a PyObject** array, but an object can reply "sorry, I don't support this PyObject** protocol". Also, I expect to have a function call to notify the object when the PyObject** view is not longer needed. Something like Py_buffer protocol PyBuffer_Release(). Maybe an object can generate a temporary PyObject** view which requires to allocate resources (like memory) and the release function would release these resources. Pseudo-code: void iterate(PyObject *obj) { PyObjectPP_View view; if (PyObjectPP_View_Get(&view, obj)) { // fast-path: the object provides a PyObject** view for (Py_ssize_t i=0; i < view.len; i++ { PyObject *item = view.array[i]; ... } PyObjectPP_View_Release(&view); } else { // slow code path using PySequence_GetItem() or anything else ... } Maybe PyObjectPP_View_Get() should increment the object reference counter to ensure that the object cannot be destroyed in the loop (if the loop calls arbitrary Python code), and PyObjectPP_View_Release() would decrement its reference counter. "PyObjectPP_View" protocol looks like PySequence_Fast() API, but IMO PySequence_Fast() is not generic enough. For example, the first issue is that it cannot reply "no, sorry, the object doesn't support PyObject**". It always creates a temporary list if the object is not a tuple or a list, that may be inefficient for a large sequence. Also, the "view" protocol should be allowed to query other types than just PyObject**. For example, what if I would like to iterate on a sequence of integers? bytes, array.array and memoryview can be seen as sequences of integers. See also HPy notes on these ideas: https://github.com/pyhandle/hpy/blob/master/docs/xxx-unsorted-notes.txt Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Victor Stinner schrieb am 23.06.20 um 11:18:
Maybe an object can generate a temporary PyObject** view which requires to allocate resources (like memory) and the release function would release these resources.
I agree that this is more explicit when it comes to resource management, but there is nothing that beats direct native data structure access when it comes to speed. If a "PyObject*[]" is not what the runtime uses internally as data structure, then why hand it out as an interface to users who require performance? There's PyIter_Next() already for those who don't. If the intention is to switch to a more efficient internal data structure inside of CPython (or expose in PyPy whatever that uses), then I would look more at PEP-393 for a good interface here, or "array.array". It's perfectly fine to have 20 different internal array types, as long as they are explicitly and safely exposed to users. Stefan
On Tue, Jun 23, 2020 at 7:52 AM Stefan Behnel
I agree that this is more explicit when it comes to resource management, but there is nothing that beats direct native data structure access when it comes to speed.
From the perspective of the function that wants to get access to the contents of an object, direct access will be the fastest. However, more globally, from the perspective of the runtime as a whole, supporting direct access in all situations typically makes other things slower. So, unless there is hot code doing a lot of direct structure access, there can be a big net loss. The JNI has a good compromise, at least for primitive types like the various classes of floats, and ints. When requesting the contents of an array, the runtime might be able to give you a direct pointer, but, if not, what you get back might be a copy of the contents of the array. To know what happened, the API gives you a signal about the ownership of the pointer in an out parameter. https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.ht... You have to release these when you're done, kind of like what Python's buffer protocol does today. For small to medium sized stuff, there is an API for just copying things out into a user provided buffer, which might be faster. https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.ht... The C-API is currently inconsistent in what kind of access you can get to the contents of an object. As I mentioned in the other thread, it would be beneficial to alternative implementations of Python to have more uniformity in what the C-API provides. If a "PyObject*[]" is not what the runtime uses internally
as data structure, then why hand it out as an interface to users who require performance? There's PyIter_Next() already for those who don't.
For arrays of pointers to objects that may be under the management of a moving garbage collector, what looks like direct access would actually be emulated and dramatically slower than doing PyIter_Next. If the intention is to switch to a more efficient internal data structure
inside of CPython (or expose in PyPy whatever that uses), then I would look more at PEP-393 for a good interface here, or "array.array". It's perfectly fine to have 20 different internal array types, as long as they are explicitly and safely exposed to users.
The internal data structure might be exactly the same, but, for example, a moving GC might make efficient direct access to the data structure from C code impossible. If you are not doing with reference types, I agree, we can do much better.
On Tue, 23 Jun 2020 11:18:37 +0200
Victor Stinner
Pseudo-code:
void iterate(PyObject *obj) { PyObjectPP_View view;
if (PyObjectPP_View_Get(&view, obj)) { // fast-path: the object provides a PyObject** view for (Py_ssize_t i=0; i < view.len; i++ { PyObject *item = view.array[i]; ... } PyObjectPP_View_Release(&view); } else { // slow code path using PySequence_GetItem() or anything else ... }
It is quite cumbersome for extension code to have to re-implement all this by hand. Instead, it would be nice to have a "Visit" primitive so that one can write e.g.: void iterate(PyObject* obj) { Py_VisitObjectSequence([&](PyObject* item) { // ... }); } The above is a C++ lambda function (a closure, actually). The C spelling would be less nice, and you'd have to ensure that it is still performant (i.e., that the visitor is inlined inside the iteration loop - at least in release builds). (I called it Py_VisitObjectSequence so that you can also have Py_VisitIntSequence, Py_VisitFloatSequence, etc.) Regards Antoine.
23.06.20 12:18, Victor Stinner пише:
For example, we can consider continuing to provide raw access to a PyObject** array, but an object can reply "sorry, I don't support this PyObject** protocol". Also, I expect to have a function call to notify the object when the PyObject** view is not longer needed. Something like Py_buffer protocol PyBuffer_Release(). Maybe an object can generate a temporary PyObject** view which requires to allocate resources (like memory) and the release function would release these resources.
Pseudo-code:
void iterate(PyObject *obj) { PyObjectPP_View view;
if (PyObjectPP_View_Get(&view, obj)) { // fast-path: the object provides a PyObject** view for (Py_ssize_t i=0; i < view.len; i++ { PyObject *item = view.array[i]; ... } PyObjectPP_View_Release(&view); } else { // slow code path using PySequence_GetItem() or anything else ... }
Maybe PyObjectPP_View_Get() should increment the object reference counter to ensure that the object cannot be destroyed in the loop (if the loop calls arbitrary Python code), and PyObjectPP_View_Release() would decrement its reference counter.
It is not enough. A list can change content and size during iteration. You need either add the "export" count which prevent list mutating, or copy the list, or use such tricks as temporary swapping its content with the empty list for the time of iteration. In all cases it is a user visible change in behavior.
"PyObjectPP_View" protocol looks like PySequence_Fast() API, but IMO PySequence_Fast() is not generic enough. For example, the first issue is that it cannot reply "no, sorry, the object doesn't support PyObject**". It always creates a temporary list if the object is not a tuple or a list, that may be inefficient for a large sequence.
If you want to avoid conversion to a list, you can check that the object is a tuple or a list before using PySequence_Fast*() API. I don't see a need for new API for this.
Also, the "view" protocol should be allowed to query other types than just PyObject**. For example, what if I would like to iterate on a sequence of integers? bytes, array.array and memoryview can be seen as sequences of integers.
Were not the buffer protocol and the memoryview object designed for this? In Python you can call `memoryview(obj).cast('I')` and then iterate integers. I think there is something similar in C.
participants (5)
-
Antoine Pitrou
-
Carl Shapiro
-
Serhiy Storchaka
-
Stefan Behnel
-
Victor Stinner