[Python-Dev] C API changes

Stefan Behnel stefan_ml at behnel.de
Sun Nov 25 03:13:18 EST 2018


Hi Armin,

Armin Rigo schrieb am 25.11.18 um 06:15:
> On Sat, 24 Nov 2018 at 22:17, Stefan Behnel wrote:
>> Couldn't this also be achieved via reference counting? Count only in C
>> space, and delete the "open object" when the refcount goes to 0?
> 
> The point is to remove the need to return the same handle to C code if
> the object is the same one.  This saves one of the largest costs of
> the C API emulation, which is looking up the object in a big
> dictionary to know if there is already a ``PyObject *`` that
> corresponds to it or not---for *all* objects that go from Python to C.

Ok, got it. And since the handle is a simple integer, there's also no
additional cost for memory allocation on the way out.


> Once we do that, then there is no need for a refcount any more.  Yes,
> you could add your custom refcount code in C, but in practice it is
> rarely done.  For example, with POSIX file descriptors, when you would
> need to "incref" a file descriptor, you instead use dup().  This gives
> you a different file descriptor which can be closed independently of
> the original one, but they both refer to the same file.

Ok, then an INCREF() would be replaced by such a dup() call that creates
and returns a new handle. In CPython, it would just INCREF and return the
PyObject*, which is as fast as the current Py_INCREF().

For PyPy, however, that means that increfs become more costly. One of the
outcomes of a recent experiment with tagged pointers for integers was that
they make increfs and decrefs more expensive, and (IIUC) that reduced the
overall performance quite visibly. In the case of pointers, it's literally
just adding a tiny condition that makes this so much slower. In the case of
handles, it would add a lookup and a reference copy in the handles array.
That's way more costly already than just the simple condition.

Now, it's unclear if this performance degredation is specific to CPython
(where PyObject* is native), or if it would also apply to PyPy. But I guess
the only way to find this out would be to try it.

IIUC, the only thing that is needed is to replace

    Py_INCREF(obj);

with

    obj = Py_NEWREF(obj);

which CPython would implement as

    #define Py_NEWREF(obj)  (Py_INCREF(obj), obj)

Py_DECREF() would then just invalidate and clean up the handle under the hood.

There are probably some places in user code where this would end up leaking
a reference by accident because of unclean reference handling (it could
overwrite the old handle in the case of a temporary INCREF/DECREF cycle),
but it might still be enough for trying it out. We could definitely switch
to this pattern in Cython (in fact, we already use such a NEWREF macro in a
couple of places, since it's a common pattern).

Overall, this seems like something that PyPy could try out as an
experiment, by just taking a simple extension module and replacing all
increfs with newref assignments. And obviously implementing the whole thing
for the C-API, but IIUC, you might be able to tweak that into your cpyext
wrapping layer somehow, without manually rewriting all C-API functions?

Stefan



More information about the Python-Dev mailing list