[Python-Dev] Re: More fun with Python shutdown

Wed Nov 12 00:22:26 EST 2003

[Greg Ewing]
> The crux of this seems to be that, now that we have weak references,
> __del__ methods are not the only thing that can trigger execution of
> arbitrary Python code when an object becomes unreferenced.

- "this" needs clarification.  Thomas Heller's bug didn't involve
  cycles, but I think that bug has no real intersection with Jim's
  woes.  Some of the shutdown glitches I've displayed here, as well
  as the ones people have griped about on c.l.py, also weren't related
  to weakref callbacks.  There's more than one (and more than two ...)
  distinct glitches here.

- It is indeed the callbacks-- not weakrefs per se --that are the
  cause of *most* of these things.

- weakref callbacks are easier to live with than __del__ methods in
  one (and maybe only one) respect:  when the death of X triggers
  a weakref callback C, C isn't passed X, but X.__del__ is.  So a
  weakref callback can't resurrect X, but X.__del__ can.  I'm not
  sure how much comfort to take from that, since a weakref callback
  could presumably resurrect other trash in its dead object's cycle.

> Maybe the GC should also refuse to collect cycles in which any member
> is referenced by a weak reference with an associated callback?

I've been meaning to think about that, but haven't been able to make more
time for it.  It should be possible to construct motivating examples.

> The alternative is to accept that arbitrary Python code can be called
> while the GC is in the midst of breaking a cycle.

Bingo -- that's my fear.  It's hard to say why in advance, but every time
we've found a spot where arbitrary Python code *can* run during gc, we've
eventually been screwed royally on that spot.  Hell, last time we pissed
away most of a week because PyObject_HasAttr (then used to ask whether
__del__ exists; no longer used) ended up making massive changes to a Zope
database as a side effect of indirectly calling the object's class's
__getattr__ hook, mutating the Python object graph massively in the process
as a side effect in turn of all the crap ZODB was doing to materialize
ghosts.  gc has to have a patch of unshifting ground to stand on.

> In that case, it's unacceptable for any object's tp_clear to set
> a Python pointer to NULL, or do anything else that would render the
> object no longer a valid Python object.

I expect it's worse than just that (since it always has been worse than just
that in the past, although nobody has been able to predict exactly how for
every case in advance).

> That would be enough to stop segfaults, but it still wouldn't entirely
> solve the problem at hand, because the fact is there's no way to break
> the self-cycle in a class's MRO without rendering it unusable as a
> class object for at least some purposes.

Phil Eby suggested a hack for that specific one (decrement the refcount, and
that's all -- the MRO holds an "illegitimate" self-reference then; wave
hands, pray, and maybe it doesn't break something else).

> Which makes me think that the only safe thing to do is treat a
> weak-ref-with-callback as tantamount to a __del__ method for GC
> purposes.

Quite possibly so.

>> But if that's what's happening, then tricks like the one on the table
>> may not be enough to stop segfaults: replacing tp_mro with an empty
>> tuple only "works" so long as the class object hasn't also been thru
>> its tp_dealloc routine.

> But that can't happen until the object's refcount has dropped to zero,
> in which case it can't be touched any longer by Python code.

Probably so.  It depends not so much on principle as on the parts of the
code where we cheat (e.g., if it were always true that refcount-dropped-to-0
implies can't-be-touched-again-by-Python-code, then what is it that gets
passed to x.__del__()?  x does -- but we cheat).