On Tue, Jun 23, 2020 at 7:52 AM Stefan Behnel <stefan_ml@behnel.de> wrote:

I agree that this is more explicit when it comes to resource management,
but there is nothing that beats direct native data structure access when it
comes to speed.

From the perspective of the function that wants to get access to the contents of an object, direct access will be the fastest. However, more globally, from the perspective of the runtime as a whole, supporting direct access in all situations typically makes other things slower. So, unless there is hot code doing a lot of direct structure access, there can be a big net loss.

The JNI has a good compromise, at least for primitive types like the various classes of floats, and ints. When requesting the contents of an array, the runtime might be able to give you a direct pointer, but, if not, what you get back might be a copy of the contents of the array. To know what happened, the API gives you a signal about the ownership of the pointer in an out parameter.

https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#Get_PrimitiveType_ArrayElements_routines

You have to release these when you're done, kind of like what Python's buffer protocol does today. For small to medium sized stuff, there is an API for just copying things out into a user provided buffer, which might be faster.

https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#Get_PrimitiveType_ArrayRegion_routines

The C-API is currently inconsistent in what kind of access you can get to the contents of an object. As I mentioned in the other thread, it would be beneficial to alternative implementations of Python to have more uniformity in what the C-API provides.

If a "PyObject*[]" is not what the runtime uses internally
as data structure, then why hand it out as an interface to users who
require performance? There's PyIter_Next() already for those who don't.

For arrays of pointers to objects that may be under the management of a moving garbage collector, what looks like direct access would actually be emulated and dramatically slower than doing PyIter_Next.

If the intention is to switch to a more efficient internal data structure
inside of CPython (or expose in PyPy whatever that uses), then I would look
more at PEP-393 for a good interface here, or "array.array". It's perfectly
fine to have 20 different internal array types, as long as they are
explicitly and safely exposed to users.

The internal data structure might be exactly the same, but, for example, a moving GC might make efficient direct access to the data structure from C code impossible. If you are not doing with reference types, I agree, we can do much better.