[Cython] Surprising behaviour wrt. generated tp_clear and tp_dealloc functions

Stefan Behnel stefan_ml at behnel.de
Mon Apr 22 07:31:11 CEST 2013


Hi,

Torsten Landschoff, 21.04.2013 23:57:
> I am using Cython to generate a trivial wrapper for a small subset of
> the (already small) interface of libp11 (see
> https://github.com/OpenSC/libp11 for information about libp11). I know a
> bit about Python extension modules and just wanted to avoid writing all
> that boiler plate code and decided to given Cython a try.
> 
> The wrapper was done in a day and no big deal, but lately I got random
> segmentation faults using it.
> 
> After a day of debugging I found the cause in my use of the __dealloc__
> special method. You may now call me stupid, because it is all in the docs:
> [...]
> But this did not give me the crucial hint: Currently, any references to
> Python objects *may* be invalid at the time dealloc is called. This is
> because Cython generates a tp_clear function for each extension type
> that calls Py_CLEAR for each object reference.

Correct, together with your complete analysis (thanks for writing it up).
For objects participating in a reference cycle, the cyclic garbage
collector will try to break the cycle at an arbitrary object, which may or
may not be yours. Thus the unpredictable segfaults.

Thanks for bringing this up. Your use case is not uncommon, and it's worth
making it easier to handle.

I also agree that setting fields to None is probably worse for the innocent
user than just setting them to NULL. IIRC, we chose to set everything to
None because that's something users can handle in their code. You can test
an attribute for None in your __dealloc__ code, but you can't test it for
NULL. OTOH, I'm not sure there are many use cases for gracefully handling a
cleared reference in __dealloc__(). If there really is something to clean
up, then being able to test for None won't usually help much. It will
prevent a crash, but also the proper cleanup.


> What I would really like to have is a possibility to exempt object
> attributes from being cleared in tp_clear, by adding a modifier to the cdef:
> 
> *Exempt all attributes:*
> 
>     cdef noclear class Slots:
> 
>         cdef Context context

That would be a class decorator. Totally makes sense to me. In fact, a
decorator to disable GC for a type would also make sense.


> *Exempt a single attribute:*
> 
>     cdef class Slots:
> 
>         cdef noclear Context context

I would like to avoid adding a separate modifier here. This could also be
handled in a class decorator that lists the excluded attributes. My
intuition tells me that by far the most common use cases are to exclude
either the entire type or exactly one attribute (as in your case, and the
lxml package has a similar need).

Cython could also adopt a policy of automatically excluding attributes
referencing types that are known to be safe, e.g. basic builtin types. No
big savings, but it may drop the need for a useless tp_clear() entirely in
some cases, and that might have an impact on the GC cleanup time, as the GC
might succeed in breaking the cycle more quickly. Reconsidering
tp_traverse() would also be worth it in this case. In some cases, we could
even drop the type's GC support completely. Think of a type that only has C
typed fields, except for one byte string attribute. That would currently
trigger the GC support for no good reason.

Another possibility (which I think we considered back in the old days)
would be to automatically exclude fields from tp_clear() that are being
syntactically referenced in a corresponding __dealloc__() method. However,
that's obviously error prone and won't catch all cases (e.g. usage in
subtypes, calling cleanup functions from __dealloc__, "if DEBUG:
print(self.attr)", ...), so I guess we'd rather not go that route. Explicit
is definitely better than implicit here.

Stefan



More information about the cython-devel mailing list