On 2019-02-27, Neil Schemenauer wrote:
I think the PyHandle idea has the best chance of producing a good end result. I suspect PEP C doesn't go far enough to solve the problems for alternative Python implementations. They really want PyObject to be an opaque handle-like object. Trying to make the existing C-API work like that seems like a nearly impossible task.
I chatted with Armin a little about his PyHandle idea and we came up with a possible refinement. I hope I can explain it accurately.
One of the key problems with PyPy implementing the CPython API is that it doesn't have space for a reference count field inside its internal memory storage for objects. So, when a CPython API returns a PyObject*, where can PyPy store the reference count? The problem makes passing objects back and forth over the CPython API expensive for PyPy. That's my understanding anyhow.
You can define a new API using opaque object handles and that solves the problem for PyPy. They don't need to emulate reference counting and can just have a global table of open object handles. The problem is, how do you convert existing CPython extensions to use this new API? They still want to do reference counting.
Here is a sketch of the API. Introduce a new, lower level API that works with object handles. Call them pyref_t. The object handle API doesn't implement reference counting. So, it is cheap for PyPy to implement it. Passing objects back and forth using the handle API is cheaper. To make it easy for existing extension modules or ones that want to use reference counting, provide a PyObject layer on top of the handle API. E.g.
pyref_t *r = pyref_some_api_that_opens_a_handle();
PyObject *o = PyObject_FromRef(r);
Py_INCREF(o);
Py_DECREF(o);
Py_DECREF(o); // handle gets closed because refcnt goes to zero
The PyObject structure could be:
typedef struct {
size_t refcnt;
pyref_t *r;
} PyObject;
Calling PyObject_FromPyRef() allocates a new one of these structures. When the reference count goes to zero, the handle is closed and the PyObject memory is freed.
To solve the non-opaque PyObject/PyTypeObject issue, I think you could have a source code option that turns on opaque types. PyPy has already implemented (mostly) compatible PyObject and PyTypeObject structures. With the option off, they do what they do now. In that case, PyObject_FromPyRef() has to return a non-opaque PyObject structure and it needs to have a ob_type slot that points to a non-opaque PyTypeObject structure.
If you turn the source option for opaque types on, something like:
#define Py_OPAQUE_PYOBJECT 1
#include <Python.h>
Things could get more efficient when you use your extension with PyPy. I.e. PyObject_FromRef() is faster because it doesn't need to fill in the ob_type pointer. Obviously your extension would not compile if you are trying to look inside PyObject structs.
Implementing this handle layer for CPython should be quite easy. PyObject_FromRef can just be a typecast from pyref* to PyObject*. No extra piece of memory needs to be allocated because the pyref object already has space for the reference count. In debug builds, CPython should check that the handle API is used correct (e.g. add a field to keep track that handles are properly closed).
Extensions can use a mix of the new pyref handle-based API and the old PyObject-based API, and get conversion functions between them. Additionally, this approach would work even when we don't support the complete details of all the old C API.
Regards,
Neil