Object deallocation during the finalization of Python program

Hi, Recently I've been facing a really weird bug where a Python program was randomly segfaulting during the finalization, the program was using some C extensions via Cython. Debugging the issue I realized that during the deallocation of one of the Python objects the deallocation function was trying to release a pointer that was surprisingly assigned to NULL. The pointer was at the same time held by another Python object that was an attribute of the Python object that had the deallocation function, something like this: class Foo: my_type * value class Bar def __cinit__: self._foo = Foo() self._foo->value = initialize() def __dealloc__: destroy(self._foo->value) Seems that randomly the instance of the object Foo held by the Bar object was deallocated by the CPython interpreter before the Foo deallocation, so after being deallocated - and zeroing the memory space of the instance of Foo - the execution of the `destroy(self._foo->value)` was in fact given as a parameter a NULL address and raising a segfault. It was a surprise for me, If I'm not missing something the deallocation of the Foo instance happened even though there was still an active reference held by the Bar object. As a kind of double-checking I changed the program for making an explicit `gc.collect()` before the last line of the Python program. As a result, I couldn't reproduce the segfault, which theoretically would mean that objects were deallocated "in order". So my question would be, could CPython deallocate the objects during the finalization step without considering the dependencies between objects? If this is not the right list to make this kind of questions, just let me know what would be the best place for making this kind of questions Thanks in advance, -- --pau

[Pau Freixes <pfreixes@gmail.com>]
Recently I've been facing a really weird bug where a Python program was randomly segfaulting during the finalization, the program was using some C extensions via Cython.
There's nothing general that can be said that would help. These things require excruciating details to resolve. It's generally the case that these things are caused by code missing some aspect of correct use of Python's C API. Which can be subtle! It's very rare that such misuse is found in the core, but not so rare in extensions. Here's the most recent. Be aware that it's a very long report, and - indeed - is stuffed to the gills with excruciating details ;-) https://bugs.python.org/issue38006 In that case a popular C extension wasn't really "playing by the rules", but we changed CPython's cyclic garbage collector to prevent it from segfaulting anyway. So, one thing you didn't tell us: which version of Python were you using? If not the most recent (3.8.1), try again with that (which contains the patches discussed in the issue report above).
Debugging the issue I realized that during the deallocation of one of the Python objects the deallocation function was trying to release a pointer that was surprisingly assigned to NULL. The pointer was at the same time held by another Python object that was an attribute of the Python object that had the deallocation function, something like this:
...
So my question would be, could CPython deallocate the objects during the finalization step without considering the dependencies between objects?
No, it could not. BUT. But but but. If C code isn't playing by all the rules, it's possible that CPython doesn't _know_ what the dependencies actually are. CPython can only know about pointers that the user's C API calls tell CPython about. In the report above, the C extension didn't tell CPython about some pointers in cycles at all. Curiously, for most kinds of garbage collectors, failing to inform the collector about pointers is massively & widely & very reliably catastrophic. If it doesn't see a pointer, it can conclude that a live object is actually trash. But CPython's cyclic collector follows pointers to prove objects _are_ trash, not to prove that they're alive. So when CPython isn't told about a pointer, it usually "fails soft", believing a trash object is actually alive instead. That can cause a memory leak, but usually not a crash. But not always. In the presence of weakrefs too, believing a trash object is actually alive can cause some kinds of finalization to be done "too late", which can cause a crash when refcounting discovers that the object actually is trash. No, you're not going to get a simple example of that ;-) DIg through the issue report above.
If this is not the right list to make this kind of questions, just let me know what would be the best place for making this kind of questions
Open an issue report instead? Discussions won't solve it, alas.

Hi Pau, Also, the Cython documentation warns against doing this kind of things (here, accessing the Python object stored in ``foo``). From https://cython.readthedocs.io/en/latest/src/userguide/special_methods.html: You need to be careful what you do in a __dealloc__() method. By the time your __dealloc__() method is called, the object may already have been partially destroyed and may not be in a valid state as far as Python is concerned, so you should avoid invoking any Python operations which might touch the object. In particular, don’t call any other methods of the object or do anything which might cause the object to be resurrected. It’s best if you stick to just deallocating C data. Armin

HI, Thanks for the comments, really interesting use case this one [1], I've just read it in diagonal but seems that is similar to the bug that finally I've found in our program. Basically GC was clearing all of the attributes before the deallocation for unbreaking an indirect reference cycle which resulted later in access to an invalid address during the deallocation, this is something that is already advised in the CYthon documentation. [1] https://bugs.python.org/issue38006 On Sat, Jan 11, 2020 at 1:36 PM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Pau,
Also, the Cython documentation warns against doing this kind of things (here, accessing the Python object stored in ``foo``). From https://cython.readthedocs.io/en/latest/src/userguide/special_methods.html:
You need to be careful what you do in a __dealloc__() method. By the time your __dealloc__() method is called, the object may already have been partially destroyed and may not be in a valid state as far as Python is concerned, so you should avoid invoking any Python operations which might touch the object. In particular, don’t call any other methods of the object or do anything which might cause the object to be resurrected. It’s best if you stick to just deallocating C data.
Armin
-- --pau
participants (3)
-
Armin Rigo
-
Pau Freixes
-
Tim Peters