[Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged)

Sun Nov 18 10:53:19 EST 2018

Neil Schemenauer schrieb am 17.11.18 um 00:10:
> I think making PyObject an opaque pointer would help.

... well, as long as type checks are still as fast as with "ob_type", and
visible to the C compiler so that it can eliminate redundant ones, I
wouldn't mind. :)

> - Borrowed references are a problem.  However, because they are so
>   commonly used and because the source code changes needed to change
>   to a non-borrowed API is non-trivial, I don't think we should try
>   to change this.  Maybe we could just discourage their use?

FWIW, the code that Cython generates has a macro guard [1] that makes it
avoid borrowed references where possible, e.g. when it detects compilation
under PyPy. That's definitely doable already, right now.

> - It would be nice to make PyTypeObject an opaque pointer as well.
>   I think that's a lot more difficult than making PyObject opaque.
>   So, I don't think we should attempt it in the near future.  Maybe
>   we could make a half-way step and discourage accessing ob_type
>   directly.  We would provide functions (probably inline) to do what
>   you would otherwise do by using op->ob_type-><something>.

I've sometimes been annoyed by the fact that protocol checks require two
pointer indirections in CPython (or even three in some cases), so that the
C compiler is essentially prevented from making any assumptions, and the
CPU branch prediction is also stretched a bit more than necessary. At
least, the slot check usually comes right before the call, so that the
lookups are not wasted. Inline functions are unlikely to improve that
situation, but at least they shouldn't make it worse, and they would be
more explicit.

Needless to say that Cython also has a macro guard in [1] that disables
direct slot access and makes it fall back to C-API calls, for users and
Python implementations where direct slot support is not wanted/available.

>   One reason you want to discourage access to ob_type is that
>   internally there is not necessarily one PyTypeObject structure for
>   each Python level type.  E.g. the VM might have specialized types
>   for certain sub-domains.  This is like the different flavours of
>   strings, depending on the set of characters stored in them.  Or,
>   you could have different list types.  One type of list if all
>   values are ints, for example.

An implementation like this could also be based on the buffer protocol.
It's already supported by the array.array type (which people probably also
just use when they have a need like this and don't want to resort to NumPy).

>   Basically, with CPython op->ob_type is super fast.  For other VMs,
>   it could be a lot slower.  By accessing ob_type you are saying
>   "give me all possible type information for this object pointer".
>   By using functions to get just what you need, you could be putting
>   less burden on the VM.  E.g. "is this object an instance of some
>   type" is faster to compute.

Agreed. I think that inline functions (well, or macros, because why not?)
that check for certain protocols explicitly could be helpful.

> - APIs that return pointers to the internals of objects are a
>   problem.  E.g. PySequence_Fast_ITEMS().  For CPython, this is
>   really fast because it is just exposing the internal details of
>   the layout that is already in the correct format.  For other VMs,
>   that API could be expensive to emulate.  E.g. you have a list to
>   store only ints.  If someone calls PySequence_Fast_ITEMS(), you
>   have to create real PyObjects for all of the list elements.

But that's intended by the caller, right? They want a flat serial
representation of the sequence, with potential conversion to a (list) array
if necessary. They might be a bit badly named, but that's exactly the
contract of the "PySequence_Fast_*()" line of functions.

In Cython, we completely avoid these functions, because they are way too
generic for optimisation purposes. Direct type checks and code
specialisation are much more effective.

> - Reducing the size of the API seems helpful.  E.g. we don't need
>   PyObject_CallObject() *and* PyObject_Call().  Also, do we really
>   need all the type specific APIs, PyList_GetItem() vs
>   PyObject_GetItem()?  In some cases maybe we can justify the bigger
>   API due to performance.  To add a new API, someone should have a
>   benchmark that shows a real speedup (not just that they imagine it
>   makes a difference).

So, in Cython, we use macros wherever possible, and often avoid generic
protocols in favour of type specialisations. We sometimes keep local copies
of C-API helper functions, because inlining them allows the C compiler to
strip down and streamline the implementation at compile time, rather than
jumping through generic code. (Also, it's sometimes required in order to
backport new CPython features to Py2.7+.)

PyPy's cpyext often just maps type specific C-API functions to the same
generic code, obviously, but in CPython, having a way to bypass protocols
and going straight to the type is really nice. I've sometimes been thinking
about how to get access to the actual implementations in CPython, because
hopping through slots when Cython already knows that, say, "float_add()"
will be called in the end is just wasteful.

I actually wonder if a more low-level C call interface [2] could also apply
to protocols. It would be great to have an iteration protocol, for example,
that would allow its user to choose whether she wants objects, integers or
C doubles as return values, and then adapt accordingly, depending on the
runtime data.

So, "reducing the size of the API", maybe not. Rather, make it more
specialised and provide a more low-level C integration of Python's
protocols that avoids the current object indirections. That would also help
projects like PyPy or Numba where things are inherently low-level
internally, but currently have to go through static signatures that require
wrapping and unwrapping things in inefficient containers.

> E.g. if my tagged pointers experiment shows significant performance
> gains (it hasn't yet).

In case it ever does, I'd certainly like to see the internals exposed in
order to make direct use of them. :)

> Victor's recent work in changing some macros to inline functions is
> not really related to the new API project, IMHO.  I don't think
> there is a problem to leave an existing macro as a macro.

Yeah, it feels more like code churn. There are some things that cannot be
done in macros (such as deciding whether to handle an error or to return a
result unchanged), in which case an inline function is nice. For simple
things, macros are just as good as inline functions. Minus some compile
time type checking, perhaps.

> However, it could be that we introduce a new ifdef like
> Py_LIMITED_API that gives a stable ABI.  E.g. when that's enabled,
> most everything would turn into non-inline functions.  In exchange
> for the performance hit, your extension would become ABI compatible
> between a range of CPython releases.  That would be a nice feature.
> Basically a more useful version of Py_LIMITED_API.

I actually wouldn't mind adding such a binary compatibility mode to Cython.
Probably some work, but in the end, it would just be another macro guard
for us. The overall size of the generated C files has rarely been a matter
of debates. :)

Stefan

[1]
https://github.com/cython/cython/blob/f158e490b9e8515cf47cf301f996c1b7e631eebb/Cython/Utility/ModuleSetupCode.c#L43-L191

[2] https://github.com/cython/peps/blob/master/pep-ccalls.rst