C-API for extending opaque types
Hello,
When reviewing the PyType_FromMetaclass function proposed in PR 93012,
it occurred to me that it would be good to add better support for
extending types with opaque structs.
Consider the tutorial example of extending list
, adapted to PyType_Spec:
typedef struct {
PyListObject list;
int state;
} SubListObject;
static PyType_Spec Sublist_spec = {
.name = "sublist.SubList",
.basicsize = sizeof(SubListObject),
...
};
If the PyListObject struct is not available (or opaque), as in the stable ABI, this approach won't work. The practical issue in PR 93012 is with metaclasses, which extend PyTypeObject or PyHeapTypeObject. Same idea but a bit “too meta” to be a good example. “Binding generator”-type projects that use the limited API resort to hacks in this area: see PySide or the experimental limited-API branch of nanobind.
I propose adding API that treats the subclass-specific data as a separate struct, and the “base” as an opaque blob (manipulated either by accessor functions or, if the struct is available, directly). Concretely:
- PyType_Spec.basicsize can be a negative (or zero) N to request -N bytes of *extra* storage on top of what the base class needs, so that the PyType_Spec can be static & read-only.
- a new
void* PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)
returns the pointer to data specific tocls
. - a new
Py_ssize_t PyObject_GetTypeataSize(PyTypeObject *cls)
returns the size of that data (computed as cls->tp_basicsize - cls->tp_base->tp_basicsize). - itemsize would get similar treatment, but I'll leave details to the PEP
Intended usage:
typedef struct {
int state;
} SubListObject;
static PyType_Spec Sublist_spec = {
.name = "sublist.SubList",
.basicsize = -sizeof(SubListObject),
...
};
...
sublist_type = PyType_FromSpec(Sublist_spec);
...
SubListObject *data = PyObject_GetTypeData(instance, sublist_type);
data->state++;
What do you think? Is it a PEP-worthy idea?
Hello,
coincidentally, this is similar to the design that HPy chose. Everything is opaque by design in HPy and so in "pure" HPy mode tp_basicsize is treated as "I need this much native space attached to objects of this type". Internally it does not even have to be laid out in memory after the object, it can be elsewhere, which can have some advantages for alternative Python implementations. There is also API to access the memory: void* HPy_AsStruct(hpy_context, object_handle).
Regarding:
- a new
void* PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)
returns the pointer to data specific tocls
.
We haven't decided yet in HPy how to deal with multiple user defined base types with different struct attached to each of them. I think that passing the class object is reasonable, but note that getting the required PyTypeObject may then require loading it from module state, which means some extra API call and also the need to pass extra argument(s) around. Alternative could be also some other "token" that can be safely shared between subinterpreters (some ID, address of the corresponding PyType_Spec, ...), because that is all the implementation will need: just distinguish which class' data the caller wants. Open question is also how important this use case is. So far in our porting experiments (kiwi, matplotlib, ultrajson, subset of numpy) we did not need that.
Note that we also need to deal with alignments if the structs are not embedded in each other, but we want to lay them next to each other in the memory. So far HPy is using a union for this: https://github.com/hpyproject/hpy/blob/master/hpy/devel/include/hpy/runtime/..., but that does not scale.
I think that from the HPy point of view it is a worthwhile idea and something that HPy could piggy back on instead of hacking around the current API.
Stepan
From: Petr Viktorin <encukou@gmail.com> Sent: Tuesday, May 24, 2022 2:28 PM To: CAPI Python Subject: [External] : [capi-sig] C-API for extending opaque types
Hello,
When reviewing the PyType_FromMetaclass function proposed in PR 93012,
it occurred to me that it would be good to add better support for
extending types with opaque structs.
Consider the tutorial example of extending list
, adapted to PyType_Spec:
typedef struct {
PyListObject list;
int state;
} SubListObject;
static PyType_Spec Sublist_spec = {
.name = "sublist.SubList",
.basicsize = sizeof(SubListObject),
...
};
If the PyListObject struct is not available (or opaque), as in the stable ABI, this approach won't work. The practical issue in PR 93012 is with metaclasses, which extend PyTypeObject or PyHeapTypeObject. Same idea but a bit “too meta” to be a good example. “Binding generator”-type projects that use the limited API resort to hacks in this area: see PySide or the experimental limited-API branch of nanobind.
I propose adding API that treats the subclass-specific data as a separate struct, and the “base” as an opaque blob (manipulated either by accessor functions or, if the struct is available, directly). Concretely:
- PyType_Spec.basicsize can be a negative (or zero) N to request -N bytes of *extra* storage on top of what the base class needs, so that the PyType_Spec can be static & read-only.
- a new
void* PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)
returns the pointer to data specific tocls
. - a new
Py_ssize_t PyObject_GetTypeataSize(PyTypeObject *cls)
returns the size of that data (computed as cls->tp_basicsize - cls->tp_base->tp_basicsize). - itemsize would get similar treatment, but I'll leave details to the PEP
Intended usage:
typedef struct {
int state;
} SubListObject;
static PyType_Spec Sublist_spec = {
.name = "sublist.SubList",
.basicsize = -sizeof(SubListObject),
...
};
...
sublist_type = PyType_FromSpec(Sublist_spec);
...
SubListObject *data = PyObject_GetTypeData(instance, sublist_type);
data->state++;
What do you think? Is it a PEP-worthy idea?
capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org https://urldefense.com/v3/__https://mail.python.org/mailman3/lists/capi-sig.... Member address: stepan.sindelar@oracle.com
On 24. 05. 22 15:43, Stepan Sindelar wrote:
Hello,
coincidentally, this is similar to the design that HPy chose. Everything is opaque by design in HPy and so in "pure" HPy mode tp_basicsize is treated as "I need this much native space attached to objects of this type". Internally it does not even have to be laid out in memory after the object, it can be elsewhere, which can have some advantages for alternative Python implementations. There is also API to access the memory: void* HPy_AsStruct(hpy_context, object_handle).
Regarding:
- a new
void* PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)
returns the pointer to data specific tocls
.We haven't decided yet in HPy how to deal with multiple user defined base types with different struct attached to each of them. I think that passing the class object is reasonable, but note that getting the required PyTypeObject may then require loading it from module state, which means some extra API call and also the need to pass extra argument(s) around. Alternative could be also some other "token" that can be safely shared between subinterpreters (some ID, address of the corresponding PyType_Spec, ...), because that is all the implementation will need: just distinguish which class' data the caller wants. Open question is also how important this use case is. So far in our porting experiments (kiwi, matplotlib, ultrajson, subset of numpy) we did not need that.
I hope that with PyCMethod's defining_class, getting the class should be easy in sufficiently many cases. Definitely not all of them, of course.
Note that we also need to deal with alignments if the structs are not embedded in each other, but we want to lay them next to each other in the memory. So far HPy is using a union for this: https://github.com/hpyproject/hpy/blob/master/hpy/devel/include/hpy/runtime/..., but that does not scale.
Yeah, that's definitely a detail with a devil inside :)
Since CPython is on C11 now, we can use a compiler-provided version of
that union, max_align_t
.
I think that from the HPy point of view it is a worthwhile idea and something that HPy could piggy back on instead of hacking around the current API.
Stepan
Hi,
The object layout "contract" of Py_LIMITED_API
currently seems too underspecified for it to be useful for binding tools, and so I welcome this step in making it more usable.
Some thoughts:
it's not just extensions inheriting from core Python types, but also the other way around. For example, if I create a new type with
PyType_FromSpec
and then extend it within Python, the CPython implementation will assume that it is legal to enlarge__basicsize__
and stick in an extra dictionary at the end.Getting this approach to work for
tp_itemsize
sounds a non-trivial technical problem to me. It would complicate the indexing implementation for item access in core types like tuples, lists, etc (where the resulting performance cost is likely not acceptable). My suggestion would be to require that subclasses can't changetp_itemsize
.Did you mean
PyObject_GetTypeSize
(you wrotePyObject_GetTypeataSize
which sounded like a typo).The
PyType_FromMetaclass
PR would go nicely along with these changes. Their combination would legitimize what is already happening in practice (e.g in PySide).
Thanks, Wenzel
Petr Viktorin wrote:
Hello, When reviewing the PyType_FromMetaclass function proposed in PR 93012, it occurred to me that it would be good to add better support for extending types with opaque structs. Consider the tutorial example of extending list, adapted to PyType_Spec: typedef struct { PyListObject list; int state; } SubListObject;
static PyType_Spec Sublist_spec = { .name = "sublist.SubList", .basicsize = sizeof(SubListObject), ... };
If the PyListObject struct is not available (or opaque), as in the stable ABI, this approach won't work. The practical issue in PR 93012 is with metaclasses, which extend PyTypeObject or PyHeapTypeObject. Same idea but a bit “too meta” to be a good example. “Binding generator”-type projects that use the limited API resort to hacks in this area: see PySide or the experimental limited-API branch of nanobind. I propose adding API that treats the subclass-specific data as a separate struct, and the “base” as an opaque blob (manipulated either by accessor functions or, if the struct is available, directly). Concretely:
PyType_Spec.basicsize can be a negative (or zero) N to request -N
bytes of *extra* storage on top of what the base class needs, so that the PyType_Spec can be static & read-only.
a new void* PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)
returns the pointer to data specific to cls.
a new Py_ssize_t PyObject_GetTypeataSize(PyTypeObject *cls) returns
the size of that data (computed as cls->tp_basicsize - cls->tp_base->tp_basicsize).
itemsize would get similar treatment, but I'll leave details to the PEP
Intended usage: typedef struct { int state; } SubListObject;
static PyType_Spec Sublist_spec = { .name = "sublist.SubList", .basicsize = -sizeof(SubListObject), ... };
...
sublist_type = PyType_FromSpec(Sublist_spec);
...
SubListObject *data = PyObject_GetTypeData(instance, sublist_type); data->state++;
What do you think? Is it a PEP-worthy idea?
On 24. 05. 22 15:47, wenzel.jakob--- via capi-sig wrote:
Hi,
The object layout "contract" of
Py_LIMITED_API
currently seems too underspecified for it to be useful for binding tools, and so I welcome this step in making it more usable.Some thoughts:
it's not just extensions inheriting from core Python types, but also the other way around. For example, if I create a new type with
PyType_FromSpec
and then extend it within Python, the CPython implementation will assume that it is legal to enlarge__basicsize__
and stick in an extra dictionary at the end.Getting this approach to work for
tp_itemsize
sounds a non-trivial technical problem to me. It would complicate the indexing implementation for item access in core types like tuples, lists, etc (where the resulting performance cost is likely not acceptable). My suggestion would be to require that subclasses can't changetp_itemsize
.
Oh, I don't intend to remove the tp_itemsize inheritance caveat:
If the base type has a non-zero tp_itemsize, it is generally not safe to set tp_itemsize to a different non-zero value in a subtype (though this depends on the implementation of the base type).
– https://docs.python.org/3/c-api/typeobj.html?highlight=tp_itemsize#c.PyTypeO...
I do intend to make this safe if all bases use PyObject_GetTypeItemData
for accessing per-item data, but I don't intend to make tuple
or
list
do that.
- Did you mean
PyObject_GetTypeSize
(you wrotePyObject_GetTypeataSize
which sounded like a typo).
PyObject_GetTypeDataSize, missing D
None of the names are final, of course.
- The
PyType_FromMetaclass
PR would go nicely along with these changes. Their combination would legitimize what is already happening in practice (e.g in PySide).
Yup.
Thanks, Wenzel
Petr Viktorin wrote:
Hello, When reviewing the PyType_FromMetaclass function proposed in PR 93012, it occurred to me that it would be good to add better support for extending types with opaque structs. Consider the tutorial example of extending list, adapted to PyType_Spec: typedef struct { PyListObject list; int state; } SubListObject;
static PyType_Spec Sublist_spec = { .name = "sublist.SubList", .basicsize = sizeof(SubListObject), ... };
If the PyListObject struct is not available (or opaque), as in the stable ABI, this approach won't work. The practical issue in PR 93012 is with metaclasses, which extend PyTypeObject or PyHeapTypeObject. Same idea but a bit “too meta” to be a good example. “Binding generator”-type projects that use the limited API resort to hacks in this area: see PySide or the experimental limited-API branch of nanobind. I propose adding API that treats the subclass-specific data as a separate struct, and the “base” as an opaque blob (manipulated either by accessor functions or, if the struct is available, directly). Concretely:
PyType_Spec.basicsize can be a negative (or zero) N to request -N
bytes of *extra* storage on top of what the base class needs, so that the PyType_Spec can be static & read-only.
a new void* PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)
returns the pointer to data specific to cls.
a new Py_ssize_t PyObject_GetTypeataSize(PyTypeObject *cls) returns
the size of that data (computed as cls->tp_basicsize - cls->tp_base->tp_basicsize).
itemsize would get similar treatment, but I'll leave details to the PEP
Intended usage: typedef struct { int state; } SubListObject;
static PyType_Spec Sublist_spec = { .name = "sublist.SubList", .basicsize = -sizeof(SubListObject), ... };
...
sublist_type = PyType_FromSpec(Sublist_spec);
...
SubListObject *data = PyObject_GetTypeData(instance, sublist_type); data->state++;
What do you think? Is it a PEP-worthy idea?
capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org https://mail.python.org/mailman3/lists/capi-sig.python.org/ Member address: encukou@gmail.com
On 24 May 2022, at 14:28, Petr Viktorin <encukou@gmail.com> wrote:
Hello, When reviewing the PyType_FromMetaclass function proposed in [PR 93012], it occurred to me that it would be good to add better support for extending types with opaque structs. Consider the [tutorial example] of extending
list
, adapted to PyType_Spec:typedef struct { PyListObject list; int state; } SubListObject;
static PyType_Spec Sublist_spec = { .name = "sublist.SubList", .basicsize = sizeof(SubListObject), ... };
If the PyListObject struct is not available (or opaque), as in the stable ABI, this approach won't work. The practical issue in PR 93012 is with metaclasses, which extend PyTypeObject or PyHeapTypeObject. Same idea but a bit “too meta” to be a good example. “Binding generator”-type projects that use the limited API resort to hacks in this area: see [PySide] or the experimental [limited-API branch of nanobind].
I propose adding API that treats the subclass-specific data as a separate struct, and the “base” as an opaque blob (manipulated either by accessor functions or, if the struct is available, directly). Concretely:
- PyType_Spec.basicsize can be a negative (or zero) N to request -N bytes of *extra* storage on top of what the base class needs, so that the PyType_Spec can be static & read-only.
- a new
void* PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)
returns the pointer to data specific tocls
.- a new
Py_ssize_t PyObject_GetTypeataSize(PyTypeObject *cls)
returns the size of that data (computed as cls->tp_basicsize - cls->tp_base->tp_basicsize).- itemsize would get similar treatment, but I'll leave details to the PEP
Intended usage:
typedef struct { int state; } SubListObject;
static PyType_Spec Sublist_spec = { .name = "sublist.SubList", .basicsize = -sizeof(SubListObject), ... };
...
sublist_type = PyType_FromSpec(Sublist_spec);
...
SubListObject *data = PyObject_GetTypeData(instance, sublist_type); data->state++;
What do you think? Is it a PEP-worthy idea?
I like this idea. This removes the need to expose class internals for some scenario’s, and would longer term open up opportunities for removing PyObject_HEAD as public API. It should also be easy to implement this efficiently, at least when keeping current restrictions when using multiple inheritance.
Ronald —
Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/
participants (4)
-
Petr Viktorin
-
Ronald Oussoren
-
Stepan Sindelar
-
wenzel.jakob@epfl.ch