[Cython] Surprising behaviour wrt. generated tp_clear and tp_dealloc functions

Torsten Landschoff torsten.landschoff at dynamore.de
Mon Apr 22 13:07:43 CEST 2013


Hi Stefan,

thank you for your extremely quick reply!

On 04/22/2013 07:31 AM, Stefan Behnel wrote:
>> But this did not give me the crucial hint: Currently, any references to
>> Python objects *may* be invalid at the time dealloc is called. This is
>> because Cython generates a tp_clear function for each extension type
>> that calls Py_CLEAR for each object reference.
> Correct, together with your complete analysis (thanks for writing it up).
> For objects participating in a reference cycle, the cyclic garbage
> collector will try to break the cycle at an arbitrary object, which may or
> may not be yours. Thus the unpredictable segfaults.
Perhaps I am misreading the CPython source code but it appears to me
that each and every object in the reference cycle will get its tp_clear
called.

http://hg.python.org/cpython/file/9c0a677dbbc0/Modules/gcmodule.c#l782

Ah, okay, I get it. The loop will terminate when it reaches an object
that actually breaks the cycle in tp_clear: The decref will cascade over
all objects in the cycle.

> Thanks for bringing this up. Your use case is not uncommon, and it's worth
> making it easier to handle.
I am glad that you see it this way. I also think that this will happen
often, but of course I was not sure if you would agree.
> I also agree that setting fields to None is probably worse for the innocent
> user than just setting them to NULL. IIRC, we chose to set everything to
> None because that's something users can handle in their code. You can test
> an attribute for None in your __dealloc__ code, but you can't test it for
> NULL. 
Hmm, okay, that's a good point that I have missed. Wouldn't it be
possible to allow the special check "foo is NULL" for any Python object?
Of course Cython usually shields the developer from the possibility that
foo actually becomes NULL so why bother. :-)
> OTOH, I'm not sure there are many use cases for gracefully handling a
> cleared reference in __dealloc__(). If there really is something to clean
> up, then being able to test for None won't usually help much. It will
> prevent a crash, but also the proper cleanup.
Agreed.
> That would be a class decorator. Totally makes sense to me. In fact, a
> decorator to disable GC for a type would also make sense. 
That would be a great feature. After all, traversing the Python objects
in my example is of no use as they can not create a cycle.
>> *Exempt a single attribute:*
>>
>>     cdef class Slots:
>>
>>         cdef noclear Context context
> I would like to avoid adding a separate modifier here. This could also be
> handled in a class decorator that lists the excluded attributes. My
> intuition tells me that by far the most common use cases are to exclude
> either the entire type or exactly one attribute (as in your case, and the
> lxml package has a similar need).
I believe your intuition matches reality. :-) A class decorator
supporting a list of excluded attributes would be fine!
> Cython could also adopt a policy of automatically excluding attributes
> referencing types that are known to be safe, e.g. basic builtin types. No
> big savings, but it may drop the need for a useless tp_clear() entirely in
> some cases, and that might have an impact on the GC cleanup time, as the GC
> might succeed in breaking the cycle more quickly. Reconsidering
One could even think about building a graph of possible object
relationships based on the type declaration for the Python attributes.
In the example, Slot refers only to Slots and Slots only to Context, so
these can't build a cycle.

> Another possibility (which I think we considered back in the old days)
> would be to automatically exclude fields from tp_clear() that are being
> syntactically referenced in a corresponding __dealloc__() method. However,
> that's obviously error prone and won't catch all cases (e.g. usage in
> subtypes, calling cleanup functions from __dealloc__, "if DEBUG:
> print(self.attr)", ...), so I guess we'd rather not go that route. Explicit
> is definitely better than implicit here.
But a good idea would be to warn if __dealloc__ actually references a
Python attribute that tp_clear could have cleared, with a pointer to the
class decorator that exempts the attribute/instance from tp_clear.

Thanks and greetings, Torsten

-- 
DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH
Torsten Landschoff

Office Dresden
Tel: +49-(0)351-4519587
Fax: +49-(0)351-4519561

mailto:torsten.landschoff at dynamore.de
http://www.dynamore.de

DYNAmore Gesellschaft für FEM Ingenieurdienstleistungen mbH
Registration court: Stuttgart, HRB 733694
Managing director: Prof. Dr. Karl Schweizerhof, Dipl.-Math. Ulrich Franz



More information about the cython-devel mailing list