Heap-allocated StructSequences
PEP-384 talks about defining a Stable ABI by making PyTypeObject opaque. Thus, banning the use of static PyTypeObjects. Specifically, I’d like to focus on those static PyTypeObjects that are initialized as StructSequences. As of today, there are 14 instances of these types (in timemodule.c, posixmodule.c, etc.) within cpython/Modules. These are all initialized through PyStructSequence_InitType2. This is very problematic when trying to make these types conform to PEP-384 as they are used through static PyTypeObjects. Problems: * PyStructSequence_InitType2 overrides the PyTypeObject: This C-API does a direct memcpy from a “prototype” structure. This effectively overrides anything set within the PyTypeObject. For example, if we were to initialize a heap allocated PyTypeObject and pass it on to this function, the C-API would just get rid of the Py_TPFLAG_HEAPTYPE flag, causing issues with the GC. * PyStructSequence_InitType2 does not work with heap allocated PyTypeObjects: Even if the function is fixed to preserve the state of the PyTypeObject and only overriding the specific slots (i.e. tp_new, tp_repr, etc.), it is expected that PyStructSequence_InitType2 will call PyType_Ready on the object. That means that the incoming object shouldn’t be initialized by a function such as PyType_FromSpec, as that would have already called PyType_Ready on it. Therefore, PyStructSequence_InitType2 will now have the responsibility of setting all the slots and properties of the PyHeapTypeObject, which is not feasible. * PyStructSequence_NewType is non-functional: This function was meant to be used as a way of creating a heap-allocated PyTypeObject that be passed to PyStructSequence_InitType2, effectively returning a heap allocated PyTypeObject. The current implementation doesn’t work in practice. Given that this struct is created in the heap, the GC has control over it. Thus, when the GC tries to traverse the type it complains with: “Error: type_traverse() called for non-heap type”, since it doesn’t have the Py_TPFLAG_HEAPTYPE flag. If we add the flag, we run into bullet point 1, if we are able to preserve the flag then we will still run into the problem of bullet point 2. Extra note: This C-API is not being used anywhere within CPython itself. Solution: * Fix the implementation of PyStructSequence_NewType: The best solution would be to fix the implementation of this function. This can easily be done by dynamically creating a PyType_Spec and calling PyType_FromSpec ``` PyObject* PyStructSequence_NewType(PyStructSequence_Desc *desc) { // … PyType_Spec* spec = PyMem_NEW(PyType_Spec, 1); spec->name = desc->name; spec->basicsize = sizeof(PyStructSequence) - sizeof(PyObject *); spec->itemsize = sizeof(PyObject *); spec->flags = Py_TPFLAGS_DEFAULT; spec->slots = PyMem_NEW(PyType_Slot, 6); spec->slots[0].slot = Py_tp_dealloc; spec->slots[0].pfunc = (destructor)structseq_dealloc; // … bases = PyTuple_Pack(1, &PyTuple_Type); type = PyType_FromSpecWithBases(spec, bases); // … ``` This will cleanly create a heap allocated PyStructSequence which can be used just like any stack allocated PyTypeObject initialized through PyStructSequence_InitType2. This means that any C-Extension should be using PyStructSequence_NewType and only built-in types should be calling PyStructSequence_InitType2. This will enable these types to comply with PEP-384 As an extra, I already have patches for this proposal. They can be found here: Branch: https://github.com/eduardo-elizondo/cpython/tree/heap-structseq Reimplement PyStructSequence_NewType: https://github.com/eduardo-elizondo/cpython/commit/413f8ca5bc008d84b3397ca1c... Patch timemodule with NewType: https://github.com/eduardo-elizondo/cpython/commit/0a35ea263a531cb03c06be9ef... Thoughts?
On 2018-09-04, Eddie Elizondo wrote:
Solution:
* Fix the implementation of PyStructSequence_NewType:
The best solution would be to fix the implementation of this function. This can easily be done by dynamically creating a PyType_Spec and calling PyType_FromSpec
Hello Eddie, Thank you for spending time to look into this. Without studying the details of your patch, your approach sounds correct to me. I think we should be allocating types from the heap and use PyType_FromSpec. Having static type definitions living in the data segment cause too many issues. We have to assess how 3rd party extension modules would be affected by this change. Unless it is too hard to do, they should still compile (perhaps with warnings) after your fix. Do you know if that's the case? Looking at your changes to structseq.c, I can't tell easily. In any case, this should go into Victor's pythoncapi fork. That fork includes all the C-API cleanup we are hoping to make to CPython (assuming we can figure out the backwards and forwards compatibility issues). Here is the project site: https://pythoncapi.readthedocs.io Regards, Neil
On 09/13/18 23:34, Neil Schemenauer wrote:
On 2018-09-04, Eddie Elizondo wrote:
Solution:
* Fix the implementation of PyStructSequence_NewType:
The best solution would be to fix the implementation of this function. This can easily be done by dynamically creating a PyType_Spec and calling PyType_FromSpec
Hello Eddie,
Thank you for spending time to look into this. Without studying the details of your patch, your approach sounds correct to me. I think we should be allocating types from the heap and use PyType_FromSpec. Having static type definitions living in the data segment cause too many issues.
We have to assess how 3rd party extension modules would be affected by this change. Unless it is too hard to do, they should still compile (perhaps with warnings) after your fix. Do you know if that's the case? Looking at your changes to structseq.c, I can't tell easily.
In any case, this should go into Victor's pythoncapi fork. That fork includes all the C-API cleanup we are hoping to make to CPython (assuming we can figure out the backwards and forwards compatibility issues).
Nope, Victor's fork doesn't include all C-API cleanup. There's an older long-term effort (PEP-384, PEP-489, the current contenders 576/579/580, and PEP-573 for the future). Converting things to use PyType_FromSpec falls in there. As long as the old API still works, these changes should go in (but they might need a PEP).
We have to assess how 3rd party extension modules would be affected by this change.
This change is fully compatible with 3rd party extensions. The current change to InitType2 is only refactoring, there is no logic change there so that API remains unchanged. Also, there should not be any instances of PyStructSequence_NewType in use. Any usage of this API causes a crash. A quick Google and Github search show that this is true. Thus, modifying this function should have no conflicts. A more interesting question would be: "Is the migration of PyStructSequence_InitType2 to PyStructSequence_NewType backwards-compatible?" The answer is yes! Using gevent as an example (https://github.com/gevent/gevent). This library has a dependency on StatResultType from cpython/Modules/posixmodule.c. This type gets initialized with PyStructSequence_InitType2. After modifying posixmodule.c to use NewType instead of InitType2 gevent still builds and passes all tests. Example: https://github.com/python/cpython/pull/9665 Thus, this change is backwards-compatible by itself and even after migrating to the NewType C-API.
Converting things to use PyType_FromSpec falls in there. As long as the old API still works, these changes should go in (but they might need a PEP).
I agree that this change is standalone and should go in by itself. Yet, I'm open to whatever people thing might be the right approach to get this in. i.e: Have more people look at it, writing a PEP, etc. - Eddie
participants (3)
-
Eddie Elizondo
-
Neil Schemenauer
-
Petr Viktorin