[Python-Dev] Design question: call __del__ for cyclical garbage?

Guido van Rossum guido@python.org
Sun, 05 Mar 2000 08:46:13 -0500


I'm beginning to believe that handing cycles with finalizers to the
user is better than calling __del__ with a different meaning, and I
tentatively withdraw my proposal to change the rules for when __del__
is called (even when __init__ fails; I haven't had any complaints
about that either).

There seem to be two competing suggestions for solutions: (1) call
some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the
object; (2) Tim's proposal of an interface to ask the garbage
collector for a trash cycle with a finalizer (or for an object with a
finalizer in a trash cycle?).

Somehow Tim's version looks less helpful to me, because it *seems*
that whoever gets to handle the cycle (the main code of the program?)
isn't necessarily responsible for creating it (some library you didn't
even know was used under the covers of some other library you called).

Of course, it's also posssible that a trash cycle is created by code
outside the responsibility of the finalizer.

But still, I have a hard time understanding how Tim's version would be
used.  Greg or Marc-Andre's version I understand.

What keeps nagging me though is what to do when there's a finalizer
but no cleanup method.  I guess the trash cycle remains alive.  Is
this acceptable?  (I guess so, because we've given the programmer a
way to resolve the trash: provide a cleanup method.)

If we detect individual cycles (the current algorithm doesn't do that
yet, though it seems easy enough to do another scan), could we
special-case cycles with only one finalizer and no cleaner-upper?
(I'm tempted to call the finalizer because it seems little harm can be
done -- but then of course there's the problem of the finalizer being
called again when the refcount really goes to zero. :-( )

> Exactly.  The *programmer* may know the right thing to do, but the Python
> implementation can't possibly know.  Facing both facts squarely constrains
> the possibilities to the only ones that are all of understandable,
> predictable and useful.  Cycles with finalizers must be a Magic-Free Zone
> else you lose at least one of those three:  even Guido's kung fu isn't
> strong enough to outguess this.
> 
> [a nice implementation sketch, of what seems an overly elaborate scheme,
>  if you believe cycles with finalizers are rare in intelligently designed
>  code)
> ]
> 
> Provided Guido stays interested in this, he'll make his own fun.  I'm just
> inviting him to move in a sane direction <0.9 wink>.

My current tendency is to go with the basic __cleanup__ and nothing
more, calling each instance's __cleanup__ before clobbering
directories and lists -- which should break all cycles safely.

> One caution:
> 
> > ...
> > If the careful-cleaning algorithm hits the end of the careful set of
> > objects and the set is non-empty, then throw an exception:
> > GCImpossibleError.
> 
> Since gc "can happen at any time", this is very severe (c.f. Guido's
> objection to making resurrection illegal).

Not quite.  Cycle detection is presumably only called every once in a
while on memory allocation, and memory *allocation* (as opposed to
deallocation) is allowed to fail.  Of course, this will probably run
into various coding bugs where allocation failure isn't dealt with
properly, because in practice this happens so rarely...

> Hand a trash cycle back to the
> programmer instead, via callback or request or whatever, and it's all
> explicit without more cruft in the implementation.  It's alive again when
> they get it back, and they can do anything they want with it (including
> resurrecting it, or dropping it again, or breaking cycles --
> anything).

That was the idea with calling the finalizer too: it would be called
between INCREF/DECREF, so the object would be considered alive for the
duration of the finalizer call.

Here's another way of looking at my error: for dicts and lists, I
would call a special *clear* function; but for instances, I would call
*dealloc*, however intending it to perform a *clear*.

I wish we didn't have to special-case finalizers on class instances
(since each dealloc function is potentially a combination of a
finalizer and a deallocation routine), but the truth is that they
*are* special -- __del__ has no responsibility for deallocating
memory, only for deallocating external resources (such as temp files).

And even if we introduced a tp_clean protocol that would clear dicts
and lists and call __cleanup__ for instances, we'd still want to call
it first for instances, because an instance depends on its __dict__
for its __cleanup__ to succeed (but the __dict__ doesn't depend on the
instance for its cleanup).  Greg's 3-phase tp_clean protocol seems
indeed overly elaborate but I guess it deals with such dependencies in
the most general fashion.

> I'd focus on the cycles themselves, not on the types of objects
> involved.  I'm not pretending to address the "order of finalization
> at shutdown" question, though (although I'd agree they're deeply
> related: how do you follow a topological sort when there *isn't*
> one?  well, you don't, because you can't).

In theory, you just delete the last root (a C global pointing to
sys.modules) and you run the garbage collector.  It might be more
complicated in practiceto track down all roots.  Another practical
consideration is that now there are cycles of the form

<function object> <=> <module dict>

which suggests that we should make function objects traceable.  Also,
modules can cross-reference, so module objects should be made
traceable.  I don't think that this will grow the sets of traced
objects by too much (since the dicts involved are already traced, and
a typical program has way fewer functions and modules than it has
class instances).  On the other hand, we may also have to trace
(un)bound method objects, and these may be tricky because they are
allocated and deallocated at high rates (once per typical method
call).

Back to the drawing board...

--Guido van Rossum (home page: http://www.python.org/~guido/)