[Python-Dev] Design question: call __del__ for cyclical garbage?
Guido van Rossum
guido@python.org
Fri, 03 Mar 2000 11:50:08 -0500
We now have two implementations of Eric Tiedemann's idea: Neil and I
both implemented it. It's too soon to post the patch sets (both are
pretty rough) but I've got another design question.
Once we've identified a bunch of objects that are only referring to
each other (i.e., one or more cycles) we have to dispose of them.
The question is, how? We can't just call free on each of the objects;
some may not be allocated with malloc, and some may contain pointers
to other malloc'ed memory that also needs to be freed.
So we have to get their destructors involved. But how? Calling
ob->ob_type->tp_dealloc(ob) for an object who reference count is
unsafe -- this will destroy the object while there are still
references to it! Those references are all coming from other objects
that are part of the same cycle; those objects will also be
deallocated and they will reference the deallocated objects (if only
to DECREF them).
Neil uses the same solution that I use when finalizing the Python
interpreter -- find the dictionaries and call PyDict_Clear() on them.
(In his unpublished patch, he also clears the lists using
PyList_SetSlice(list, 0, list->ob_size, NULL). He's also generalized
so that *every* object can define a tp_clear function in its type
object.)
As long as every cycle contains at least one dictionary or list
object, this will break cycles reliably and get rid of all the
garbage. (If you wonder why: clearing the dict DECREFs the next
object(s) in the cycle; if the last dict referencing a particular
object is cleared, the last DECREF will deallocate that object, which
will in turn DECREF the objects it references, and so forth. Since
none of the objects in the cycle has incoming references from outside
the cycle, we can prove that this will delete all objects as long as
there's a dict or list in each cycle.
However, there's a snag. It's the same snag as what finalizing the
Python interpreter runs into -- it has to do with __del__ methods and
the undefined order in which the dictionaries are cleared.
For example, it's quite possible that the first dictionary we clear is
the __dict__ of an instance, so this zaps all its instance variables.
Suppose this breaks the cycle, so then the instance itself gets
DECREFed to zero. Its deallocator will be called. If it's got a
__del__, this __del__ will be called -- but all the instance variables
have already been zapped, so it will fail miserably!
It's also possible that the __dict__ of a class involved in a cycle
gets cleared first, in which case the __del__ no longer "exists", and
again the cleanup is skipped.
So the question is: What to *do*?
My solution is to make an extra pass over all the garbage objects
*before* we clear dicts and lists, and for those that are instances
and have __del__ methods, call their __del__ ("by magic", as Tim calls
it in another post). The code in instance_dealloc() already does the
right thing here: it calls __del__, then discovers that the reference
count is > 0 ("I'm not dead yet" :-), and returns without freeing the
object. (This is also why I want to introduce a flag ensuring that
__del__ gets called by instance_dealloc at most once: later when the
instance gets DECREFed to 0, instance_dealloc is called again and will
correctly free the object; but we don't want __del__ called again.)
[Note for Neil: somehow I forgot to add this logic to the code;
in_del_called isn't used! The change is obvious though.]
This still leaves a problem for the user: if two class instances
reference each other and both have a __del__, we can't predict whose
__del__ is called first when they are called as part of cycle
collection. The solution is to write each __del__ so that it doesn't
depend on the other __del__.
Someone (Tim?) in the past suggested a different solution (probably
found in another language): for objects that are collected as part of
a cycle, the destructor isn't called at all. The memory is freed
(since it's no longer reachable), but the destructor is not called --
it is as if the object lives on forever.
This is theoretically superior, but not practical: when I have an
object that creates a temp file, I want to be able to reliably delete
the temp file in my destructor, even when I'm part of a cycle!
--Guido van Rossum (home page: http://www.python.org/~guido/)