Neil Schemenauer schrieb am 17.11.18 um 00:10:
I think making PyObject an opaque pointer would help.
... well, as long as type checks are still as fast as with "ob_type", and visible to the C compiler so that it can eliminate redundant ones, I wouldn't mind. :)
- Borrowed references are a problem. However, because they are so commonly used and because the source code changes needed to change to a non-borrowed API is non-trivial, I don't think we should try to change this. Maybe we could just discourage their use?
FWIW, the code that Cython generates has a macro guard [1] that makes it avoid borrowed references where possible, e.g. when it detects compilation under PyPy. That's definitely doable already, right now.
- It would be nice to make PyTypeObject an opaque pointer as well. I think that's a lot more difficult than making PyObject opaque. So, I don't think we should attempt it in the near future. Maybe we could make a half-way step and discourage accessing ob_type directly. We would provide functions (probably inline) to do what you would otherwise do by using op->ob_type-><something>.
I've sometimes been annoyed by the fact that protocol checks require two pointer indirections in CPython (or even three in some cases), so that the C compiler is essentially prevented from making any assumptions, and the CPU branch prediction is also stretched a bit more than necessary. At least, the slot check usually comes right before the call, so that the lookups are not wasted. Inline functions are unlikely to improve that situation, but at least they shouldn't make it worse, and they would be more explicit. Needless to say that Cython also has a macro guard in [1] that disables direct slot access and makes it fall back to C-API calls, for users and Python implementations where direct slot support is not wanted/available.
One reason you want to discourage access to ob_type is that internally there is not necessarily one PyTypeObject structure for each Python level type. E.g. the VM might have specialized types for certain sub-domains. This is like the different flavours of strings, depending on the set of characters stored in them. Or, you could have different list types. One type of list if all values are ints, for example.
An implementation like this could also be based on the buffer protocol. It's already supported by the array.array type (which people probably also just use when they have a need like this and don't want to resort to NumPy).
Basically, with CPython op->ob_type is super fast. For other VMs, it could be a lot slower. By accessing ob_type you are saying "give me all possible type information for this object pointer". By using functions to get just what you need, you could be putting less burden on the VM. E.g. "is this object an instance of some type" is faster to compute.
Agreed. I think that inline functions (well, or macros, because why not?) that check for certain protocols explicitly could be helpful.
- APIs that return pointers to the internals of objects are a problem. E.g. PySequence_Fast_ITEMS(). For CPython, this is really fast because it is just exposing the internal details of the layout that is already in the correct format. For other VMs, that API could be expensive to emulate. E.g. you have a list to store only ints. If someone calls PySequence_Fast_ITEMS(), you have to create real PyObjects for all of the list elements.
But that's intended by the caller, right? They want a flat serial representation of the sequence, with potential conversion to a (list) array if necessary. They might be a bit badly named, but that's exactly the contract of the "PySequence_Fast_*()" line of functions. In Cython, we completely avoid these functions, because they are way too generic for optimisation purposes. Direct type checks and code specialisation are much more effective.
- Reducing the size of the API seems helpful. E.g. we don't need PyObject_CallObject() *and* PyObject_Call(). Also, do we really need all the type specific APIs, PyList_GetItem() vs PyObject_GetItem()? In some cases maybe we can justify the bigger API due to performance. To add a new API, someone should have a benchmark that shows a real speedup (not just that they imagine it makes a difference).
So, in Cython, we use macros wherever possible, and often avoid generic protocols in favour of type specialisations. We sometimes keep local copies of C-API helper functions, because inlining them allows the C compiler to strip down and streamline the implementation at compile time, rather than jumping through generic code. (Also, it's sometimes required in order to backport new CPython features to Py2.7+.) PyPy's cpyext often just maps type specific C-API functions to the same generic code, obviously, but in CPython, having a way to bypass protocols and going straight to the type is really nice. I've sometimes been thinking about how to get access to the actual implementations in CPython, because hopping through slots when Cython already knows that, say, "float_add()" will be called in the end is just wasteful. I actually wonder if a more low-level C call interface [2] could also apply to protocols. It would be great to have an iteration protocol, for example, that would allow its user to choose whether she wants objects, integers or C doubles as return values, and then adapt accordingly, depending on the runtime data. So, "reducing the size of the API", maybe not. Rather, make it more specialised and provide a more low-level C integration of Python's protocols that avoids the current object indirections. That would also help projects like PyPy or Numba where things are inherently low-level internally, but currently have to go through static signatures that require wrapping and unwrapping things in inefficient containers.
E.g. if my tagged pointers experiment shows significant performance gains (it hasn't yet).
In case it ever does, I'd certainly like to see the internals exposed in order to make direct use of them. :)
Victor's recent work in changing some macros to inline functions is not really related to the new API project, IMHO. I don't think there is a problem to leave an existing macro as a macro.
Yeah, it feels more like code churn. There are some things that cannot be done in macros (such as deciding whether to handle an error or to return a result unchanged), in which case an inline function is nice. For simple things, macros are just as good as inline functions. Minus some compile time type checking, perhaps.
However, it could be that we introduce a new ifdef like Py_LIMITED_API that gives a stable ABI. E.g. when that's enabled, most everything would turn into non-inline functions. In exchange for the performance hit, your extension would become ABI compatible between a range of CPython releases. That would be a nice feature. Basically a more useful version of Py_LIMITED_API.
I actually wouldn't mind adding such a binary compatibility mode to Cython. Probably some work, but in the end, it would just be another macro guard for us. The overall size of the generated C files has rarely been a matter of debates. :) Stefan [1] https://github.com/cython/cython/blob/f158e490b9e8515cf47cf301f996c1b7e631ee... [2] https://github.com/cython/peps/blob/master/pep-ccalls.rst