Heap types (PyType_FromSpec) must fully implement the GC protocol
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Hi, In the Python stdlib, many heap types currently don't "properly" (fully?) implement the GC protocol which can prevent to destroy these types at Python exit. As a side effect, some other Python objects can also remain alive, and so are not destroyed neither. There is an on-going effect to destroy all Python objects at exit (bpo-1635741). This problem is getting worse when subinterpreters are involved: Refleaks buildbots failures which prevent to spot other regressions, and so these "leaks" / "GC bugs" must be fixed as soon as possible. In my experience, many leaks spotted by tests using subinterpreters were quite old, it's just that they were ignored previously. It's an hard problem and I don't see any simple/obvious solution right now, except of *workarounds* that I dislike. Maybe the only good solution is to fix all heap types, one by one. == Only the Python stdlib should be affected == PyType_FromSpec() was added to Python 3.2 by the PEP 384 to define "heap types" in C, but I'm not sure if it's popular in practice (ex: Cython doesn't use it, but defines static types). I expect that most types to still be defined the old style (static types) in a vas majority of third party extension modules. To be clear, static types are not affected by this email. Third party extension modules using the limited C API (to use the stable ABI) and PyType_FromSpec() can be affected (if they don't fully implement the GC protocol). == Heap type instances now stores a strong reference to their type == In March 2019, the PyObject_Init() function was modified in bpo-35810 to keep a strong reference (INCREF) to the type if the type is a heap type. The fixed problem was that heap types could be destroyed before the last instance is destroyed. == GC and heap types == The new problem is that most heap types don't collaborate well with the garbage collector. The garbage collector doesn't know anything about Python objects, types, reference counting or anything. It only uses the PyGC_Head header and the traverse functions. If an object holds a strong reference to an object but its type does not define a traverse function, the GC cannot guess/infer this reference. A heap type must respect the following 3 conditions to collaborate with the GC: * Have the Py_TPFLAGS_HAVE_GC flag; * Define a traverse function (tp_traverse) which visits the type: Py_VISIT(Py_TYPE(self)); * Instances must be tracked by the GC. If one of these conditions is not met, the GC can fail to destroy a type during a GC collection. If an instance is kept alive late while a Python interpreter is being deleted, it's possible that the type is never deleted, which can keep indirectly *many* objects alive and so don't delete them neither. In practice, when a type is not deleted, a test using subinterpreter starts to fail on Refleaks buildbot since it leaks references. Without subinterpreters, such leak is simply ignored, whereas this is an on-going effect to delete Python objects at exit (bpo-1635741). == Boring traverse functions == Currently, there is no default traverse implementation which visits the type. For example, I had the implement the following function for _thread.LockType: static int lock_traverse(lockobject *self, visitproc visit, void *arg) { Py_VISIT(Py_TYPE(self)); return 0; } It's a little bit annoying to have to implement the GC protocol whereas a lock cannot contain other Python objects, it's not a container. It's just a thin wrapper to a C lock. There is exactly one strong reference: to the type. == Workaround: loop on gc.collect() == A workaround is to run gc.collect() in a loop until it returns 0 (no object was collected). == Traverse automatically? Nope. == Pablo Galindo attempts to automatically visit the type in the traverse function: https://bugs.python.org/issue40217 https://github.com/python/cpython/commit/0169d3003be3d072751dd14a5c84748ab63... Moreover, What's New in Python 3.9 contains a long section suggesting to implement a traverse function for this problem, but it doesn't suggest to track instances: https://docs.python.org/dev/whatsnew/3.9.html#changes-in-the-c-api This solution causes too many troubles, and so instead, traverse functions were defined on heap types to visit the type. Currently in the master branch, 89 types are defined as heap types on a total of 206 types (117 types are defined statically). I don't think that these 89 heap types respect the 3 conditions to collaborate with the GC. == How should we address this issue? == I'm not sure what should be done. Working around the issue by triggering multiple GC collections? Emit a warning in development mode if a heap type doesn't collaborate well with the GC? If core developers miss these bugs and have troubles to debug them, I expect that extension module authors would suffer even more. == GC+heap type bugs became common == I'm fixing such GC issue for 1 year as part as the work on cleaning Python objects at exit, and also indirectly related to subinterpreters. The behavior is surprising, it's really hard to dig into GC internals and understand what's going on. I wrote an article on this kind of "GC bugs": https://vstinner.github.io/subinterpreter-leaks.html Today, I learnt the hard way that defining a traverse is *not* enough. The type constructor (tp_new) must also track instances! See my fix for _multibytecodec related to CJK codecs: https://github.com/python/cpython/commit/11ef53aefbecfac18b63cee518a7184f771... https://bugs.python.org/issue42866 == Reference cycles are common == The GC only serves to break reference cycles. But reference cycles are rare, right? Well... First of all, most types create reference cycles involing themselves. For example, a type __mro__ tuple contains the type which already creates a ref cycle. Type methods can also contain a reference to the type. => The GC must break the cycle, otherwise the type cannot be destroyed When a function is defined in a Python module, the function __globals__ is the module namespace (module.__dict__) which... contains the function. Defining a function in a Python module also creates a reference cycle which prevents to delete the module namespace. If a function is used as a callback somewhere, the whole module remains "alive" until the reference to the callback is cleared. Example. os.register_at_fork() and codecs.register() callbacks are cleared really late during Python finalization. Currently, it's basically the last objects which are cleared at Python exit. After that, there is exactly one final GC collection. => The GC == Debug GC issues == * gc.get_referents() and gc.get_referrers() can be used to check traverse functions. * gc.is_tracked() can be used to check if the GC tracks an object. * Using the gdb debugger on gc_collect_main() helps to see which objects are collected. See for example the finalize_garbage() functions which calls finalizers on unreachable objects. * The solution is usually a missing traverse functions or a missing Py_VISIT() in an existing traverse function. == __del__ hack for debugging == If you want to play with the issue or if you have to debug a GC issue, you can use an object which logs a message when it's being deleted: class VerboseDel: def __del__(self): print("DELETE OBJECT") obj = VerboseDel() Warning: creating such object in a module also prevents to destroy the module namespace when the last reference to the module is deleted! __del__.__globals__ contains a reference to the module namespace, and obj.__class__ contains a reference to the type... Yeah, ref cycle and GC issues are fun! == Long email == Yeah, I like to put titles in my long emails. Enjoy. Happy hacking! Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/78d01/78d0121057ef01b75628908c4ad7e1d6fcbadc34" alt=""
On Sat, 9 Jan 2021 02:02:17 +0100 Victor Stinner <vstinner@python.org> wrote:
It's an hard problem and I don't see any simple/obvious solution right now, except of *workarounds* that I dislike. Maybe the only good solution is to fix all heap types, one by one.
Ok. Why are we adding heap types to the stdlib exactly? Is the goal to have exactly zero shared objects between subinterpreters? Regards Antoine.
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Hi, There are multiple PEPs covering heap types. The latest one refers to other PEPs: PEP 630 "Isolating Extension Modules" by Petr Viktorin. https://www.python.org/dev/peps/pep-0630/#motivation The use case is to embed multiple Python instances (interpreters) in the same application process, or to embed Python with multiple calls to Py_Initialize/Py_Finalize (sequentially, not in parallel). Static types are causing different issues for these use cases. Also, it's not possible to destroy static types at Python exit, which goes against the on-going effort to destroy all Python objects at exit (bpo-1635741). Victor
data:image/s3,"s3://crabby-images/ef9a3/ef9a3cb1fb9fd7a4920ec3c178eaddbb9c521a58" alt=""
On 1/11/21 5:26 PM, Victor Stinner wrote:
Hi,
There are multiple PEPs covering heap types. The latest one refers to other PEPs: PEP 630 "Isolating Extension Modules" by Petr Viktorin. https://www.python.org/dev/peps/pep-0630/#motivation
The use case is to embed multiple Python instances (interpreters) in the same application process, or to embed Python with multiple calls to Py_Initialize/Py_Finalize (sequentially, not in parallel). Static types are causing different issues for these use cases.
If a type is immutable and has no references to heap-allocated objects, it could stay as a static type. The issue is that very many types don't fit that. For example, if some method needs to raise a module-specific exception, that's a reference to a heap-allocated type, because custom exceptions generally aren't static.
Also, it's not possible to destroy static types at Python exit, which goes against the on-going effort to destroy all Python objects at exit (bpo-1635741).
I don't see why we would need to destroy immutable static objects. They don't need to be freed.
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
On Tue, Jan 12, 2021 at 3:28 PM Petr Viktorin <encukou@gmail.com> wrote:
If a type is immutable and has no references to heap-allocated objects, it could stay as a static type. The issue is that very many types don't fit that. For example, if some method needs to raise a module-specific exception, that's a reference to a heap-allocated type, because custom exceptions generally aren't static. (...) I don't see why we would need to destroy immutable static objects. They don't need to be freed.
I'm not sure of your definition of "immutable" here. At the C level, many immutable Python objects are mutable. For example, a str instance *can* be modified with the C level, and computing hash(<my string>) modifies the object as well (the internal cached hash value). Any type contains at least one Python object: the __mro__ tuple. Most types also contain a __subclass__ dictionary (by default, it's NULL). These objects are created at Python startup, but not destroyed at Python exit. See also tp_bases (tuple) and tp_dict (dict). I tried once to "finalize" static types, but it didn't go well: * https://github.com/python/cpython/pull/20763 * https://bugs.python.org/issue1635741#msg371119 It doesn't look to be safe to clear static types. Many functions rely on the fact that static types are "always there" and are never finalized. Also, only a few static types are cleared by my PR: many static types are left unchanged. For example, static types of the _io module. It seems like a safer approach is to continue the work on bpo-40077: "Convert static types to PyType_FromSpec()". Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/ef9a3/ef9a3cb1fb9fd7a4920ec3c178eaddbb9c521a58" alt=""
On 1/12/21 4:09 PM, Victor Stinner wrote:
On Tue, Jan 12, 2021 at 3:28 PM Petr Viktorin <encukou@gmail.com> wrote:
If a type is immutable and has no references to heap-allocated objects, it could stay as a static type. The issue is that very many types don't fit that. For example, if some method needs to raise a module-specific exception, that's a reference to a heap-allocated type, because custom exceptions generally aren't static. (...) I don't see why we would need to destroy immutable static objects. They don't need to be freed.
I'm not sure of your definition of "immutable" here. At the C level, many immutable Python objects are mutable. For example, a str instance *can* be modified with the C level, and computing hash(<my string>) modifies the object as well (the internal cached hash value).
Any type contains at least one Python object: the __mro__ tuple. Most types also contain a __subclass__ dictionary (by default, it's NULL). These objects are created at Python startup, but not destroyed at Python exit. See also tp_bases (tuple) and tp_dict (dict).
Ah, right. __subclasses__ is the reason these need to be heap types (if they allow subclassing, which – isn't). If __mro__ is a tuple of static types, it could probably be made static as well; hashes could be protected by a lock.
I tried once to "finalize" static types, but it didn't go well:
* https://github.com/python/cpython/pull/20763 * https://bugs.python.org/issue1635741#msg371119
It doesn't look to be safe to clear static types. Many functions rely on the fact that static types are "always there" and are never finalized. Also, only a few static types are cleared by my PR: many static types are left unchanged. For example, static types of the _io module. It seems like a safer approach is to continue the work on bpo-40077: "Convert static types to PyType_FromSpec()".
Yes, seems so. And perhaps this has enough subtle details to want a PEP?
data:image/s3,"s3://crabby-images/cec08/cec089140a64306b69651782aded59e2dfac66d0" alt=""
On 2021-01-12, Victor Stinner wrote:
It seems like a safer approach is to continue the work on bpo-40077: "Convert static types to PyType_FromSpec()".
I agree that trying to convert static types is a good idea. Another possible bonus might be that we can gain some performance by integrating garbage collection with the Python object memory allocator. Static types frustrate that effort. Could we have something easier to use than PyType_FromSpec(), for the purposes of coverting existing code? I was thinking of something like: static PyTypeObject Foo_TypeStatic = { } static PyTypeObject *Foo_Type; PyInit_foo(void) { Foo_Type = PyType_FromStatic(&Foo_TypeStatic); } The PyType_FromStatic() would return a new heap type, created by copying the static type. The static type could be marked as being unusable (e.g. with a type flag).
data:image/s3,"s3://crabby-images/8aca7/8aca7e22be08ab16930a56176dfa4ee2085cde7b" alt=""
One worry that I have in general with this move is the usage of _PyType_GetModuleByDef to get the type object from the module definition. This normally involves getting a TLS in every instance creation, which can impact notably performance for some perf-sensitive types or types that are created a lot. On Tue, 12 Jan 2021 at 18:21, Neil Schemenauer <nas-python@arctrix.com> wrote:
On 2021-01-12, Victor Stinner wrote:
It seems like a safer approach is to continue the work on bpo-40077: "Convert static types to PyType_FromSpec()".
I agree that trying to convert static types is a good idea. Another possible bonus might be that we can gain some performance by integrating garbage collection with the Python object memory allocator. Static types frustrate that effort.
Could we have something easier to use than PyType_FromSpec(), for the purposes of coverting existing code? I was thinking of something like:
static PyTypeObject Foo_TypeStatic = { } static PyTypeObject *Foo_Type;
PyInit_foo(void) { Foo_Type = PyType_FromStatic(&Foo_TypeStatic); }
The PyType_FromStatic() would return a new heap type, created by copying the static type. The static type could be marked as being unusable (e.g. with a type flag). _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RPG2TRQL... Code of Conduct: http://python.org/psf/codeofconduct/
data:image/s3,"s3://crabby-images/ef9a3/ef9a3cb1fb9fd7a4920ec3c178eaddbb9c521a58" alt=""
On 1/12/21 7:48 PM, Pablo Galindo Salgado wrote:
One worry that I have in general with this move is the usage of _PyType_GetModuleByDef to get the type object from the module definition. This normally involves getting a TLS in every instance creation,
Not TLS, it's walking the MRO.
which can impact notably performance for some perf-sensitive types or types that are created a lot.
But yes, that's right. _PyType_GetModuleByDef should not be used in perf-sensitive spots, at least not without profiling. There's often an alternative, though. Do you have any specific cases you're concerned about?
On Tue, 12 Jan 2021 at 18:21, Neil Schemenauer <nas-python@arctrix.com <mailto:nas-python@arctrix.com>> wrote:
On 2021-01-12, Victor Stinner wrote: > It seems like a safer approach is to continue the work on > bpo-40077: "Convert static types to PyType_FromSpec()".
I agree that trying to convert static types is a good idea. Another possible bonus might be that we can gain some performance by integrating garbage collection with the Python object memory allocator. Static types frustrate that effort.
Could we have something easier to use than PyType_FromSpec(), for the purposes of coverting existing code? I was thinking of something like:
static PyTypeObject Foo_TypeStatic = { } static PyTypeObject *Foo_Type;
PyInit_foo(void) { Foo_Type = PyType_FromStatic(&Foo_TypeStatic); }
The PyType_FromStatic() would return a new heap type, created by copying the static type. The static type could be marked as being unusable (e.g. with a type flag). _______________________________________________ Python-Dev mailing list -- python-dev@python.org <mailto:python-dev@python.org> To unsubscribe send an email to python-dev-leave@python.org <mailto:python-dev-leave@python.org> https://mail.python.org/mailman3/lists/python-dev.python.org/ <https://mail.python.org/mailman3/lists/python-dev.python.org/> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RPG2TRQL... <https://mail.python.org/archives/list/python-dev@python.org/message/RPG2TRQL...> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HOCGUW3S... Code of Conduct: http://python.org/psf/codeofconduct/
data:image/s3,"s3://crabby-images/78d01/78d0121057ef01b75628908c4ad7e1d6fcbadc34" alt=""
On Tue, 12 Jan 2021 18:48:39 +0000 Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
One worry that I have in general with this move is the usage of _PyType_GetModuleByDef to get the type object from the module definition. This normally involves getting a TLS in every instance creation, which can impact notably performance for some perf-sensitive types or types that are created a lot.
If it's inlined C TLS it should be fast (*). If it's Python's emulated TLS then probably not :-) (*) see https://godbolt.org/z/d7eKx7 Regards Antoine.
data:image/s3,"s3://crabby-images/cec08/cec089140a64306b69651782aded59e2dfac66d0" alt=""
On 2021-01-12, Pablo Galindo Salgado wrote:
One worry that I have in general with this move is the usage of _PyType_GetModuleByDef to get the type object from the module definition. This normally involves getting a TLS in every instance creation, which can impact notably performance for some perf-sensitive types or types that are created a lot.
I would say _PyType_GetModuleByDef is the problem. Why do we need to use such an ugly approach (walking the MRO) when Python defined classes don't have the same performance issue? E.g. class A: def b(): pass A.b.__globals__ IMHO, we should be working to make types and functions defined in extensions more like the pure Python versions. Related, my "__namespace__" idea[1] might be helpful in reducing the differences between pure Python modules and extension modules. Rather than functions having a __globals__ property, which is a dict, they would have a __namespace__, which is a module object. Basically, functions and methods known which global namespace (module) they have been defined in. For extension modules, when you call a function or method defined in the extension, it could be passed the module instance, by using the __namespace__ property. Maybe I'm missing some details on why this approach wouldn't work. However, at a high level, I don't see why it shouldn't. Maybe performance would be an issue? Reducing the number of branches in code paths like CALL_FUNCTION should help. 1. https://github.com/nascheme/cpython/tree/frame_no_builtins
data:image/s3,"s3://crabby-images/ef9a3/ef9a3cb1fb9fd7a4920ec3c178eaddbb9c521a58" alt=""
On 1/12/21 8:23 PM, Neil Schemenauer wrote:
On 2021-01-12, Pablo Galindo Salgado wrote:
One worry that I have in general with this move is the usage of _PyType_GetModuleByDef to get the type object from the module definition. This normally involves getting a TLS in every instance creation, which can impact notably performance for some perf-sensitive types or types that are created a lot.
I would say _PyType_GetModuleByDef is the problem. Why do we need to use such an ugly approach (walking the MRO) when Python defined classes don't have the same performance issue? E.g.
class A: def b(): pass A.b.__globals__
IMHO, we should be working to make types and functions defined in extensions more like the pure Python versions.
Related, my "__namespace__" idea[1] might be helpful in reducing the differences between pure Python modules and extension modules. Rather than functions having a __globals__ property, which is a dict, they would have a __namespace__, which is a module object. Basically, functions and methods known which global namespace (module) they have been defined in. For extension modules, when you call a function or method defined in the extension, it could be passed the module instance, by using the __namespace__ property.
Maybe I'm missing some details on why this approach wouldn't work. However, at a high level, I don't see why it shouldn't. Maybe performance would be an issue? Reducing the number of branches in code paths like CALL_FUNCTION should help.
The main difference between Python and C functions is that in C, you need type safety. You can't store C state in a mutable dict (or module) accessible from Python, because when users invalidate your C invariants, you get a segfault rather than a nice AttributeError. Making methods "remember" their context does work though, and has already been implemented -- see PEP 573! It uses the *defining class* instead of __namespace__, but you can get the module from that quite easily. The only place it doesn't work are slot methods, which have a fixed C API. For example: PyObject *tp_repr(PyObject *self); int tp_init(PyObject *self, PyObject *args, PyObject *kwds); There is no good way to pass the method, module object, globals() or the defining class to such functions.
1. https://github.com/nascheme/cpython/tree/frame_no_builtins
data:image/s3,"s3://crabby-images/ef9a3/ef9a3cb1fb9fd7a4920ec3c178eaddbb9c521a58" alt=""
On 1/12/21 7:16 PM, Neil Schemenauer wrote:
On 2021-01-12, Victor Stinner wrote:
It seems like a safer approach is to continue the work on bpo-40077: "Convert static types to PyType_FromSpec()".
I agree that trying to convert static types is a good idea. Another possible bonus might be that we can gain some performance by integrating garbage collection with the Python object memory allocator. Static types frustrate that effort.
Could we have something easier to use than PyType_FromSpec(), for the purposes of coverting existing code? I was thinking of something like:
static PyTypeObject Foo_TypeStatic = { } static PyTypeObject *Foo_Type;
PyInit_foo(void) { Foo_Type = PyType_FromStatic(&Foo_TypeStatic); }
The PyType_FromStatic() would return a new heap type, created by copying the static type. The static type could be marked as being unusable (e.g. with a type flag).
Unfortunately, it's not just the creation that needs to be changed. You also need to decref Foo_Type somewhere. Your example is for "single-phase init" modules (pre-PEP 489). Those don't have a dealloc hook, so they will leak memory (e.g. in multiple Py_Initialize/Py_Finalize cycles). Multi-phase init (PEP 489) allows multiple module instances of extension modules. Assigning PyType_FromStatic's result to a static pointer would mean that every instance of the module will create a new type, and overwrite any existing one. And the deallocation will either leave a dangling pointer or NULL the pointer for other module instances. So, you need to make the type part of the module state, so that the module has proper ownership of the type. And that means you need to access the type from the module state any time you need to use it. At that point, IMO, PyType_FromStatic saves you so little work that it's not worth supporting a third variation of type creation code.
data:image/s3,"s3://crabby-images/27e86/27e863c4c2463bb52186a2215b5c9464a0d3c0fc" alt=""
Having used the heap types extensively for JPype, I believe that converting all types too heap types would be a great benefit. There are still minor rough spots in which a static type can do things that heap types cannot (such as you can derive a type which is marked final when it is static but not heap such as function). But generally I found heap types to be much more flexible. I found that heap types were better in concept than static but because the majority of the API (and the examples on using CAPI) were static the heap types paths were less exercised. I eventually puzzled out most of the mysteries, but having the everything be the same (except for old static types that should be marked as immortal) likely has a lot of side benefits. Of course the other issue that I have with heap types is that they currently lack the concept of meta classes. Thus there are things that you can do from the Python language that you can't do from the C API. See... https://bugs.python.org/issue42617 The downside of course is there are a lot of calls in the C API that infer that static type is fixed address. Perhaps those call all be macros to the which equate to evaluating the address of the heap type. But that is just my 2 cents. --Karl -----Original Message----- From: Neil Schemenauer <nas-python@arctrix.com> Sent: Tuesday, January 12, 2021 10:17 AM To: Victor Stinner <vstinner@python.org> Cc: Python Dev <python-dev@python.org> Subject: [Python-Dev] Re: Heap types (PyType_FromSpec) must fully implement the GC protocol On 2021-01-12, Victor Stinner wrote:
It seems like a safer approach is to continue the work on bpo-40077: "Convert static types to PyType_FromSpec()".
I agree that trying to convert static types is a good idea. Another possible bonus might be that we can gain some performance by integrating garbage collection with the Python object memory allocator. Static types frustrate that effort. Could we have something easier to use than PyType_FromSpec(), for the purposes of coverting existing code? I was thinking of something like: static PyTypeObject Foo_TypeStatic = { } static PyTypeObject *Foo_Type; PyInit_foo(void) { Foo_Type = PyType_FromStatic(&Foo_TypeStatic); } The PyType_FromStatic() would return a new heap type, created by copying the static type. The static type could be marked as being unusable (e.g. with a type flag). _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RPG2TRQL... Code of Conduct: http://python.org/psf/codeofconduct/
data:image/s3,"s3://crabby-images/78d01/78d0121057ef01b75628908c4ad7e1d6fcbadc34" alt=""
On Tue, 12 Jan 2021 15:22:36 +0100 Petr Viktorin <encukou@gmail.com> wrote:
On 1/11/21 5:26 PM, Victor Stinner wrote:
Hi,
There are multiple PEPs covering heap types. The latest one refers to other PEPs: PEP 630 "Isolating Extension Modules" by Petr Viktorin. https://www.python.org/dev/peps/pep-0630/#motivation
The use case is to embed multiple Python instances (interpreters) in the same application process, or to embed Python with multiple calls to Py_Initialize/Py_Finalize (sequentially, not in parallel). Static types are causing different issues for these use cases.
If a type is immutable and has no references to heap-allocated objects, it could stay as a static type. The issue is that very many types don't fit that. For example, if some method needs to raise a module-specific exception, that's a reference to a heap-allocated type, because custom exceptions generally aren't static.
Aren't we confusing two different things here? - a mutable *type*, i.e. a type with mutable state attached to itself (not to instances) - a mutable *instance*, where the mutable state is per-instance While it's very common for custom exceptions to have mutable instance state (e.g. a backend-specific error number), I can't think of any custom exception that has mutable state attached to the exception *type*.
Also, it's not possible to destroy static types at Python exit, which goes against the on-going effort to destroy all Python objects at exit (bpo-1635741).
I don't see why we would need to destroy immutable static objects. They don't need to be freed.
Right. Regards Antoine.
data:image/s3,"s3://crabby-images/ef9a3/ef9a3cb1fb9fd7a4920ec3c178eaddbb9c521a58" alt=""
On 1/12/21 4:34 PM, Antoine Pitrou wrote:
On Tue, 12 Jan 2021 15:22:36 +0100 Petr Viktorin <encukou@gmail.com> wrote:
On 1/11/21 5:26 PM, Victor Stinner wrote:
Hi,
There are multiple PEPs covering heap types. The latest one refers to other PEPs: PEP 630 "Isolating Extension Modules" by Petr Viktorin. https://www.python.org/dev/peps/pep-0630/#motivation
The use case is to embed multiple Python instances (interpreters) in the same application process, or to embed Python with multiple calls to Py_Initialize/Py_Finalize (sequentially, not in parallel). Static types are causing different issues for these use cases.
If a type is immutable and has no references to heap-allocated objects, it could stay as a static type. The issue is that very many types don't fit that. For example, if some method needs to raise a module-specific exception, that's a reference to a heap-allocated type, because custom exceptions generally aren't static.
Aren't we confusing two different things here?
- a mutable *type*, i.e. a type with mutable state attached to itself (not to instances)
- a mutable *instance*, where the mutable state is per-instance
While it's very common for custom exceptions to have mutable instance state (e.g. a backend-specific error number), I can't think of any custom exception that has mutable state attached to the exception *type*.
You're right, exception types *could* generally be static. However, the most common API for creating them, PyErr_NewException[WithDoc], creates heap types.
data:image/s3,"s3://crabby-images/336ef/336ef92ebbe168d208323325255f5c5a63a42091" alt=""
Hi Victor, Thank you for looking into these issues. They are very important to HPy too! HPy currently only supports head types for similar reasons to why they are important to sub-interpreters -- their lifecycle can be managed by the Python interpreter and they are not tied to the memory and life cycle of the dynamic library containing the C extension. E.g. with heap types the interpreter can control when a type is created and destroyed and when it can be accessed. We've run into some minor issues with the limitations in PyType_Slot (https://docs.python.org/3/c-api/type.html#c.PyType_Slot.PyType_Slot.slot) but we are working around them for the moment. It would be useful to have some sense of where PyType_FromSpec is headed -- e.g. is it a goal to have it support all of the features of static types in the future -- so that we can perhaps help suggest / implement small changes that head in the right direction and also ensure that HPy is aligned with the immediate future of the C API. Yours sincerely, Simon Cross
data:image/s3,"s3://crabby-images/ef9a3/ef9a3cb1fb9fd7a4920ec3c178eaddbb9c521a58" alt=""
Simon Cross wrote:
We've run into some minor issues with the limitations in PyType_Slot (https://docs.python.org/3/c-api/type.html#c.PyType_Slot.PyType_Slot.slot) but we are working around them for the moment. It would be useful to have some sense of where PyType_FromSpec is headed -- e.g. is it a goal to have it support all of the features of static types in the future -- so that we can perhaps help suggest / implement small changes that head in the right direction and also ensure that HPy is aligned with the immediate future of the C API.
Yes, the goal is to have it support all the features of static types. If you see something that's not in PEP 630 open issues (https://www.python.org/dev/peps/pep-0630/#open-issues), I'd like to know. I'm using https://github.com/encukou/abi3/issues to collect issues related to the stable ABI. Maybe have some of HPy's issues there already. And fixes are always welcome, of course :)
participants (8)
-
Antoine Pitrou
-
encukou@gmail.com
-
Neil Schemenauer
-
Nelson, Karl E.
-
Pablo Galindo Salgado
-
Petr Viktorin
-
Simon Cross
-
Victor Stinner