[Python-Dev] Design question: call __del__ for cyclical garbage?

Greg Stein gstein@lyra.org
Fri, 3 Mar 2000 18:59:26 -0800 (PST)

On Fri, 3 Mar 2000, Tim Peters wrote:
> Note, though, that there is NO good answer to finalizers in cycles!  The

"Note" ?? Not just a note, but I'd say an axiom :-)

By definition, you have two objects referring to each other in some way.
How can you *definitely* know how to break the link between them? Do you
call A's finalizer or B's first? If they're instances, do you just whack
their __dict__ and hope for the best?

> So here's what I'd consider doing:  explicit is better than implicit, and in
> the face of ambiguity refuse the temptation to guess.  If a trash cycle
> contains a finalizer (my, but that has to be rare. in practice, in
> well-designed code!), don't guess, but make it available to the user.  A
> gc.guardian() call could expose such beasts, or perhaps a callback could be
> registered, invoked when gc finds one of these things.  Anyone crazy enough
> to create cyclic trash with finalizers then has to take responsibility for
> breaking the cycle themself.  This puts the burden on the person creating
> the problem, and they can solve it in the way most appropriate to *their*
> specific needs.  IOW, the only people who lose under this scheme are the
> ones begging to lose, and their "loss" consists of taking responsibility.

I'm not sure if Tim is saying the same thing, but I'll write down a
concreate idea for cleaning garbage cycles.

First, a couple observations:

* Some objects can always be reliably "cleaned": lists, dicts, tuples.
  They just drop their contents, with no invocations against any of them.

  Note that an instance without a __del__ has no opinion on how it is
  (this is related to Tim's point about whether a cycle has a finalizer)

* The other objects may need to *use* their referenced objects in some way
  to clean out cycles.

Since the second set of objects (possibly) need more care during their
cleanup, we must concentrate on how to solve their problem.

Back up a step: to determine where an object falls, let's define a
tp_clean type slot. It returns an integer and takes one parameter: an
operation integer.

    Py_TPCLEAN_CARE_CHECK      /* check whether care is needed */
    Py_TPCLEAN_CARE_EXEC       /* perform the careful cleaning */
    Py_TPCLEAN_EXEC            /* perform a non-careful cleaning */

Given a set of objects that require special cleaning mechanisms, there is
no way to tell where to start first. So... just pick the first one. Call
its tp_clean type slot with CARE_EXEC. For instances, this maps to
__clean__. If the instance does not have a __clean__, then tp_clean
returns FALSE meaning that it could not clean this object. The algorithm
moves on to the next object in the set.

If tp_clean returns TRUE, then the object has been "cleaned" and is moved
to the "no special care needed" list of objects, awaiting its reference
count to hit zero.

Note that objects in the "care" and "no care" lists may disappear during
the careful-cleaning process.

If the careful-cleaning algorithm hits the end of the careful set of
objects and the set is non-empty, then throw an exception:
GCImpossibleError. The objects in this set each said they could not be
cleaned carefully AND they were not dealloc'd during other objects'

[ it could be possible to define a *dynamic* CARE_EXEC that will succeed
  if you call it during a second pass; I'm not sure this is a Good Thing
  to allow, however. ]

This also implies that a developer should almost *always* consider writing
a __clean__ method whenever they write a __del__ method. That method MAY
be called when cycles need to be broken; the object should delete any
non-essential variables in such a way that integrity is retained (e.g. it
fails gracefully when methods are called and __del__ won't raise an
error). For example, __clean__ could call a self.close() to shut down its
operation. Whatever... you get the idea.

At the end of the iteration of the "care" set, then you may have objects
remaining in the "no care" set. By definition, these objects don't care
about their internal references to other objects (they don't need them
during deallocation). We iterate over this set, calling tp_clean(EXEC).
For lists, dicts, and tuples, the tp_clean(EXEC) call simply clears out
the references to other objects (but does not dealloc the object!). Again:
objects in the "no care" set will go away during this process. By the end
of the iteration over the "no care" set, it should be empty.

[ note: the iterations over these sets should probably INCREF/DECREF
  across the calls; otherwise, the object could be dealloc'd during the
  tp_clean call. ]

[ if the set is NOT empty, then tp_clean(EXEC) did not remove all possible
  references to other objects; not sure what this means. is it an error?
  maybe you just force a tp_dealloc on the remaining objects. ]

Note that the tp_clean mechanism could probably be used during the Python
finalization, where Python does a bunch of special-casing to clean up
modules. Specifically: a module does not care about its contents during
its deallocation, so it is a "no care" object; it responds to
tp_clean(EXEC) by clearing its dictionary. Class objects are similar: they
can clear their dict (which contains a module reference which usually
causes a loop) during tp_clean(EXEC). Module cleanup is easy once objects
with CARE_CHECK have been handled -- all that funny logic in there is to
deal with "care" objects.


Greg Stein, http://www.lyra.org/