Torsten Landschoff, 22.04.2013 13:07:
On 04/22/2013 07:31 AM, Stefan Behnel wrote:
But this did not give me the crucial hint: Currently, any references to Python objects *may* be invalid at the time dealloc is called. This is because Cython generates a tp_clear function for each extension type that calls Py_CLEAR for each object reference. Correct, together with your complete analysis (thanks for writing it up). For objects participating in a reference cycle, the cyclic garbage
collector will try to break the cycle at an arbitrary object, which may or may not be yours. Thus the unpredictable segfaults.
Perhaps I am misreading the CPython source code but it appears to me that each and every object in the reference cycle will get its tp_clear called.
http://hg.python.org/cpython/file/9c0a677dbbc0/Modules/gcmodule.c#l782
Ah, okay, I get it. The loop will terminate when it reaches an object that actually breaks the cycle in tp_clear: The decref will cascade over all objects in the cycle.
In any case, there are no guarantees about which objects within a reference cycle will be cleared or not.
I also agree that setting fields to None is probably worse for the innocent user than just setting them to NULL. IIRC, we chose to set everything to None because that's something users can handle in their code. You can test an attribute for None in your __dealloc__ code, but you can't test it for NULL.
Hmm, okay, that's a good point that I have missed. Wouldn't it be possible to allow the special check "foo is NULL" for any Python object? Of course Cython usually shields the developer from the possibility that foo actually becomes NULL so why bother. :-)
Exactly - an extremely special case. It's an inherent property of the Cython language that user visible references are never NULL. (minus evil C code, obviously.)
That would be a class decorator. Totally makes sense to me. In fact, a decorator to disable GC for a type would also make sense.
That would be a great feature. After all, traversing the Python objects in my example is of no use as they can not create a cycle.
Cython could also adopt a policy of automatically excluding attributes referencing types that are known to be safe, e.g. basic builtin types. No big savings, but it may drop the need for a useless tp_clear() entirely in some cases, and that might have an impact on the GC cleanup time, as the GC might succeed in breaking the cycle more quickly. Reconsidering
One could even think about building a graph of possible object relationships based on the type declaration for the Python attributes. In the example, Slot refers only to Slots and Slots only to Context, so these can't build a cycle.
Interesting. Yes, that might work. And it's even easy to control when we allow users to override this decision explicitly using a class decorator.
a good idea would be to warn if __dealloc__ actually references a Python attribute that tp_clear could have cleared, with a pointer to the class decorator that exempts the attribute/instance from tp_clear.
That's a good idea. Any help with this is appreciated. :) Stefan