Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures)

Tim Peters tim_one@email.msn.com
Tue, 15 Apr 2003 00:02:02 -0400


[Greg Ewing]
> ...
> What bothers me, though, is that even with finalizers, the library
> writer *still* can't guarantee eventual reclamation.  The application
> can unwittingly stuff it all up by creating cycles, and there's
> nothing the library writer can do about it.

They're not trying very hard, then -- and, admittedly, most don't.  For
example, every time the library grabs a resource that needs finalization, it
can plant a weakref to it in a singleton private module object with a
__del__ method.  When the module is torn down at shutdown, that object's
__del__ gets called via refcount-falls-from-1-to-0 (it's a private object --
the library author can surely guarantee *it* isn't in a cycle), and free
whichever resources still exist then.  The library could instead register a
cleanup function via atexit().  Or it could avoid weakrefs by setting up a
thread that wakes up every now and again, to scan gc.garbage for instances
of the objects it passed out.  Finding one, it could finalize the resources
held by the object, mark the object as no longer needing resource
finalization, and letting the object leak.  And so on -- Python supplies
lots of ways to get what you want even here.

> It seems to me that giving up on finalization altogether in the
> presence of cycles is too harsh. In most cases, the cycle isn't
> actually going to make any difference.  With a cycle of your
> abovementioned file-descriptor-holding objects, for example, could be
> finalized in an arbitrary order, because the *finalizers* don't depend
> on any other objects in the cycle.

I expect that's usually so, but that detecting that it's so is intractable.
Even if we relied on programmers declaring their beliefs explicitly, Python
still has to be paranoid enough to avoid crashing if the stated beliefs
aren't really true.  For example, if you fight your way through the details
of Java's Byzantine finalization scheme, you'll find that the hairiest parts
of it exist just to ensure that Java's gc internals never end up
dereferencing dangling pointers.  This has the added benefit that most
experienced Java programmers appear to testify that Java's finalizers are
useless <wink>.

> So maybe there should be some way of classifying the references held
> by an object into those that are relied upon by its finalizer, and
> those that aren't.

How?  I believe this is beyond realistic automated analysis for Python
source.

> The algorithm would then be to first go through and clear all the
> references that *aren't* needed by finalizers, and then...
> [assuming there's no problem leads to the conclusion there's no
>  problem <wink>]

You probably need also to detect that the finalizer can't resurrect the
object either, else clearing references that aren't needed specifically for
finalization would leave the resurrected object in a damaged state.