[Python-Dev] Re: More fun with Python shutdown

Wed Nov 12 23:47:37 EST 2003

[Bernhard Herzog]
> Wouldn't it be possible to call the callbacks of all weakrefs that
> point to a cycle about to be destroyed before that destruction begins?

Yes, but GC couldn't also go on to call tp_clear then -- without deeper
changes, the objects would have to leak.

Suppose objects I and J have (strong) references to each other -- they form
a two-object cycle.  Suppose I also holds a weakref to J, with a callback to
a method of I.

Suppose the cycle becomes unreachable.  GC detects that.  It can also (with
small changes to current code) detect that J has a weakref-associated
callback, and invoke it.

But when the callback returns, GC must stop trying to make progress:  at
that point it knows absolutely nothing anymore about the object graph,
because there's absolutely nothing a callback can't do.  In particular,
because the callback in the example is a method of I, it has full access to
I (via the callback's "self" argument), and because I has a strong reference
to J, it also has full access to J.  The callback can resurrect either or
both the objects, and/or install new weakref callbacks on either or both, or
even break the strong-reference cycle manually so that normal refcounting
completely destroys I before the callback returns (although there's an
obscure technical reason for why the callback can't completely destroy J
before it returns -- I ahd J are different in this one respect).

If GC went on to, for example, execute tp_clear on I or J, tp_clear can
leave behind an accessible (if the callback resurrected it) insane object,
where "insane" means one that a user-- whether in innocence or by hostile
design doesn't matter --can exploit to crash the interpreter.  For example,
Jim has proven that a new-style class object is insane in this way after its
tp_clear is invoked, and it's extremely easy to provoke one into
segfaulting.

Of course that's right out -- we're trying to repair a current segfault, not
supply subtler ways to create segfaults.

We also have to do this within the boundaries of what can be sold for a
bugfix release, so gross changes in semantics are also right out.  In
particular, we've never said that tp_clear has to leave an object in a
usable state, so it would be a hard sell to start to demand that in a bugfix
release.

Still, I want this to work.  There's a saving grace here that __del__
methods don't have:  if a __del__ method resurrects an object, there's
nothing to stop the __del__ method from getting called again (when the
refcount falls to 0 again).  But weakref callbacks are *already* one-shot
things:  a given weakref callback destroys itself as part of the process of
getting invoked.  So once we've invoked a weakref callback for J, that
callback is history.  Sick code *in* the callback could install *another*
weakref callback on J, so we have to be careful, but J's original callbacks
are gone forever, and in almost all code will leave J callback-free.

As above, GC cannot go on to call tp_clear after invoking a callback.
However, after invoking all the callbacks, it *could* start another "mini"
gc cycle, taking the list of cyclic trash as its starting point (as "the
generation" to be collected).  This is the only way it can know what the
post-callback state of the object graph is.  In all sane code, this mini-gc
run will discover that (a) all this stuff is still cyclic trash, and (b)
none of it has weakref-callbacks anymore.  *Then* it's safe to run through
the list calling tp_clear methods.

In sick code (code that resurrects objects via a weakref callback, or
registers new weakref callbacks to dead objects via a weakref callback), the
mini gc run will automatically remove the resurrected objects from current
consideration (they'll move to an older generation as a matter of course).
It may even discover that nothing is trash anymore.  If so, no harm done:
because we haven't called tp_clear on anything, nothing has been damaged.

If there's some trash left with (necessarily) new weakref callbacks, we're
back to where we started.  We *could* proceed the same way then, but I'm
afraid that would give actively hostile code a way to put gc into a
never-ending loop.  Instead I'd simply move those objects into the next
generation, and let gc end then.  Again, because we haven't called tp_clear
on anything, nothing has been damaged in this case either.

A subtlety:  instead of doing the "mini gc pass", why not just move the
leftover objects into an older generation and let gc return right away then?
The problem:  any weakref callback in any cyclic trash would stop a complete
invocation of gc from removing any trash then.  A perfectly ordinary,
non-hostile program, that happened to create lots of weakref callbacks in
cyclic trash could then get into a state where every time gc runs, it finds
one of these things, and despite that the app never does anything sick (like
resurrecting in a callback), gc would never make any progress.  The true
purpose of the "mini gc pass" is to ensure that gc does make progress in
sane code, and no matter how quickly and sustainedly it creates dead cycles
containing weakref callbacks.

Terminology subtlety:  the "mini" in "mini gc pass" refers to that the
generation it starts with is presumably small, not to that this pass has an
especially easy time of it.  It still has to do all the work of deducing
liveness  and deadness from scratch.  There are no shortcuts it can take
here, simply because there's nothing a callback can't do.  However, this
pass should go quickly:  it starts with what *was* entirely trash in cycles,
and it's probably still entirely trash in cycles.  This is maximally easy
for Python's kind of cyclic gc (it chases all and only the objects in the
dead cycles then -- it doesn't have to visit any objects outside the dead
cycles, *unless* the cycles aren't truly dead anymore).  So for sane
programs, it adds gc time proportional to the number of pointers in the dead
cycles, independent of the total number of objects.

All cyclic trash found by all gc invocations consumes a little more time
too, because we have to ask each trash object whether it has an associated
weakref callback.  In most programs, most of the time, the answer will be
"no".