[Python-Dev] RE: More fun with Python shutdown

Tue Nov 11 12:07:20 EST 2003

[Jim Fulton, on <http://www.python.org/sf/839548>]
> ...
> The theory is that it occurs when a cycle involving a class is broken
> by calling the tp_clear slot on a heap type.  I verified this by
> setting a gdb break point in Zope 3 and verifying that type_clear was
> called while a type still had a ref count much higher than 1.
>
> From a purely theoretical point of view, the current behavior is
> wrong.

It is, but a segfault is more than just pure theory <wink>.

> There is clearly an invariant that tp_mro is not None and
> type_clear violates this.  The fix (setting the mro to () in
> type_clear, is pretty straightforward.

The invariant is that tp_mro is not NULL so long as anyone may reference it.
tp_clear believes that tp_mro will never be referenced again, but it's
demonstrably wrong in that belief.  The real bug lies there:  why is its
belief wrong?

You patched it so that tp_mro doesn't become NULL, thus avoiding the
immediate segfault, but until we understand *why* the invariant got
violated, it's unclear that the patch is "a fix".  Code is still accessing
the MRO after tp_clear is called, but now instead of a segfault it's going
to see an empty MRO.  That's also (and clearly so, at least to me)
incorrect:  code that tries to access a class's MRO should see the MRO the
programmer intended, and no sane class has an empty tuple for its MRO.  So I
think the "tp_mro <- ()" patch exchanges gross breakage for subtler
breakage.

> My assumption is that it's possible for this to occur at times other
> than shutdown, although, perhaps, wildly unlikely.

In the absence of real understanding, who knows.  If it is possible before
shutdown, then the importance of not exposing user code to a made-up MRO
skyrockets, IMO.

> What's especially poorly understood is how to make it happen in a
> smallter test program.

> ...
> BTW, with a debug build, I get an assertion error rather than a
> segfault.

Which assertion fails then?  That may be a good clue toward truly
understanding what's causing this.

>> """
>> import weakref
>> import os
>>
>> class C(object):
>>     def hi(self, w=os.write):
>>         w(1, 'hi 1\n')
>>         print 'hi 2'
>>
>> def pp(c=C()):
>>     c.hi()
>>
>> import sys
>> exec "import %s as somemodule" % sys.argv[1] in globals() del sys
>>
>> somemodule.c1 = C()
>> somemodule.awr = weakref.ref(somemodule.c1, lambda ignore, pp=pp:
>> pp())
>>
>> del C, pp
>> """

...

>> C:\Code\python\PCbuild>python temp4.py __builtin__
>> hi 1

...

>> The only one I can't make any sense of is __builtin__:  the weakref
>> callback is certainly invoked then, but its print statement neither
>> produces output nor raises an exception.

> When trying to debug this in Zope 3, I similarly noticed that prints
> in the weakref callback produced no output.

I'm not sure this one's worth pursuing.  Your problem occurred during the
second call to gc in finalization, and the sys module has been gutted by
that point.  In particular, sys.stdout has been cleared, so a print
statement can't work then.  The only mystery to me wrt this is why it didn't
raise an exception, like the

>> Exception exceptions.AttributeError: "'NoneType' object has no attribute
>>     'write'" in <function <lambda> at 0x006B6C70> ignored

raised when calling that little program with "sys" instead of "__builtin__".