[Python-Dev] Making weakref callbacks safe in cyclic gc

Mon Nov 17 17:57:06 EST 2003

[Tim]
>> ... but cycles are getting very easy to create in Python by accident,
>> so I don't really want to settle for that [push cyclic trash with
>> weakref callbacks into gc.garbage]

[Neil Schemenauer]
> Agreed.

Good!  I haven't worked with you for a year -- let's party <wink>.

> It sucks to have to make things a lot more inconvenient just
> because it's theoretically possible for people to make the
> system behave badly.

I don't know how it happened, but sometime over the last few years I've
switched from thinking "well, ya, they could do that, but no real code
would" to "if they can do that, they will -- and especially if they're
hostile".  I didn't even have to take a job at Elemental Security to enjoy
this personality adjustment <wink>.

>> Another scheme is to just run all the weakref callbacks associated
>> with trash cycles, without tp_clear'ing anything first.  Then run
>> gc again to figure out what's still trash, and repeat until no
>> more weakref callbacks in trash cycles exist.

> Repeatedly running the GC sounds like trouble to me.

Me too.

> I think it would be better to move everything reachable from them
> into the youngest generation, finish the GC pass and then run them.
> I haven't been thinking about this as hard as you have though, so
> perhaps I'm missing some subtlety.

That's essentially what my SF patch does, but with a maddeningly wrong idea
for "them" (in "move everything reachable from them").  I think it could be
repaired by computing the objects reachable from the callbacks (instead of
computing the objects reachable from the objects *with* callbacks).  That
gets hairier, though, and there's one more thing ...

> I have to wonder if anyone would care if __del__ methods were
> one-shot as well.  As a user, I would rather have one-shot __del__
> methods and not have to deal with gc.garbage.

Are you sure?  All Java programmers I've heard talk about it say that
finalizers in Java are so bloody useless they don't use them at all.  Maybe
that's a good thing.  Part of the problem is that the order of finalization
isn't defined, and a program that appears to run fine under testing can fail
horribly in real life when the conditions feeding gc change a bit and
provoke a different order of finalization.  That's the primary reason I was
loathe to run __del__ methods in an arbitrary order:  horrid order-dependent
bugs can easily escape non-exhaustive testing, and there's no feasible way
for the user to provoke all N! ways of running N finalizers in a cycle even
if they want to get exhaustive.

For that reason, I'm growing increasingly fond of the idea of clearing the
trash weakrefs first.  If no callbacks get invoked, the order they're not
invoked in probably doesn't matter <wink>.  The technical hangup with that
one right now is that clearing a weakref decrefs the callback, which can
make the callback object die, and the callback object can itself have a
weakref (with a different callback) pointing to *it*.  In that case,
arbitrary Python code gets executed during gc, and in an arbitrary order
again.  There must be a hack to worm around that.

> It would be nice if we could treat both kinds of finalizers consistently.
> Unfortunately I can't think of a way of noting that the __del__ method
> was already run.

One bit in the object would be enough.  Alas, that "one bit" turns out to be
4 bytes, and I've lost count of how many useful one-bit flags we've failed
to add over the years to fear of losing those bytes for the first time.

> I suppose if __del__ method continued to work the way they do,
> people could just use weakref callbacks to do finalization.

If they can ensure the weakref outlives the object, maybe.  Another barrier
is that the weakref callback doesn't expose the object that died:  it's
presumed to already be trash, and, in the absence of trash cycles, *is*
already trash by the time the callback is invoked.  So getting at "self" is
a puzzle for a weakref callback pointing at self.  A binding for self can be
installed as a default argument for the callback, but then that self appears
in the function object keeps self alive for as long as the callback is
alive!  Then the only way for self to go away is for the whole shebang to
vanish in a trash cycle.

So finalization of an object isn't what Python's weakref callbacks were
aiming at, and it's a real strain to use them for that.  Python callbacks
were designed to let other objects know that a given object went away;
that's what weak dicts need to know, for example.