[Python-Dev] Re: More fun with Python shutdown

Tue Nov 11 12:47:42 EST 2003

At 12:25 PM 11/11/03 -0500, Jim Fulton wrote:
>Tim Peters wrote:
>>[Jim Fulton, on <http://www.python.org/sf/839548>]
>>
>>>...
>>>The theory is that it occurs when a cycle involving a class is broken
>>>by calling the tp_clear slot on a heap type.  I verified this by
>>>setting a gdb break point in Zope 3 and verifying that type_clear was
>>>called while a type still had a ref count much higher than 1.
>> From a purely theoretical point of view, the current behavior is
>>>wrong.
>>
>>It is, but a segfault is more than just pure theory <wink>.
>
>I don't know what your point is here.

It's a joke, laugh.  :)

>>>There is clearly an invariant that tp_mro is not None and
>>>type_clear violates this.  The fix (setting the mro to () in
>>>type_clear, is pretty straightforward.
>>
>>The invariant is that tp_mro is not NULL so long as anyone may reference it.
>>tp_clear believes that tp_mro will never be referenced again, but it's
>>demonstrably wrong in that belief.  The real bug lies there:  why is its
>>belief wrong?
>
>I thought that tp_clear was called to break cycles. Surely, if a class is
>in a cycle, there are references to it. Why would one assume that none
>of these references are instances?

Actually, the funny thing here is that it's unlikely that the cycle a type 
is in involves its base classes.  The only way I know of in pure Python to 
have such a cycle is to set an attribute of the base class to refer to the 
subclass, which means that clearing each type's dictionary (and other 
metaclass-defined slots, if any) should be sufficient to break the cycle, 
without touching tp_mro.

>>You patched it so that tp_mro doesn't become NULL, thus avoiding the
>>immediate segfault, but until we understand *why* the invariant got
>>violated, it's unclear that the patch is "a fix".  Code is still accessing
>>the MRO after tp_clear is called, but now instead of a segfault it's going
>>to see an empty MRO.  That's also (and clearly so, at least to me)
>>incorrect:  code that tries to access a class's MRO should see the MRO the
>>programmer intended, and no sane class has an empty tuple for its MRO.  So I
>>think the "tp_mro <- ()" patch exchanges gross breakage for subtler
>>breakage.
>
>Surely, the original intent is top break something. ;)
>I'd much rather get an attribute error than a segfault or an
>equally fatal C assertion error.

What's baffling me is what code is accessing the class after tp_clear is 
called.  It can't be a __del__ method, or the cycle collector wouldn't be 
calling tp_clear, right?  Or does it run __del__ methods during shutdown?