[Python-Dev] PEP 442: Safe object finalization

Armin Rigo arigo at tunes.org
Sat May 18 15:24:08 CEST 2013


Hi Antoine,

On Sat, May 18, 2013 at 10:59 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Cyclic isolate (CI)
>     A reference cycle in which no object is referenced from outside the
>     cycle *and* whose objects are still in a usable, non-broken state:
>     they can access each other from their respective finalizers.

Does this definition include more complicated cases?  For example:

    A -> B -> A    and   A -> C -> A

Neither cycle is isolated.  If there is no reference from outside,
then the set of all three objects is isolated, but isn't strictly a
cycle.  I think the term is "strongly connected component".

> 1. Weakrefs to CI objects are cleared, and their callbacks called. At
>    this point, the objects are still safe to use.
>
> 2. **The finalizers of all CI objects are called.**

You need to be very careful about what each call to a finalizer can do
to the object graph.  It may already be what you're doing, but the
most careful solution is to collect in "1." the complete list of
objects with finalizers that are in cycles; then incref them all; then
call the finalizer of each of them; then decref them all.  Such a
solution gives new cases to think about, which are slightly unexpected
for CPython's model: for example, if you have a cycle A -> B -> A,
let's say the GC calls A.__del__ first; it might cause it to store a
reference to B somewhere else, e.g. in some global; but then the GC
calls B.__del__ anyway.  This is probably fine but should be
considered.

> 3. **The CI is traversed again to determine if it is still isolated.

How is this done?  I don't see a clear way to determine it by looking
only at the objects in the CI, given that arbitrary modifications of
the object graph may have occurred.  The solution I can think of
doesn't seem robust against minor changes done by the finalizer.  Take
the example "A -> lst -> B -> A", where the reference from A to B is
via a list (e.g. there is an attribute "A.attr = [B]").  If A.__del__
does the seemingly innocent change of replacing the list with a copy
of itself, e.g. "A.attr = A.attr[:]", then after the finalizers are
called, "lst" is gone and we're left with "A -> lst2 -> B -> A".
Checking that this cycle is still isolated requires a possibly large
number of checks, as far as I can tell.  This can lead to O(n**2)
behavior if there are n objects in total and O(n) cycles.

The solution seems to be to simply wait for the next GC execution.
Assuming that a finalizer is only called once, this only delays a bit
freeing objects with finalizers in cycles (but your PEP still works to
call finalizers and eventually collect the objects).  Alternatively,
this might be done immediately: in the point "3." above we can forget
everything we found so far, and redo the tracking on all objects (this
time ignoring finalizers that were already called).  In fact, it may
be necessary anyway: anything found before might be invalid after the
finalizers are called, so forgetting it all and redoing the tracking
from scratch seems to be the only way.

> Type objects get a new ``tp_finalize`` slot to which ``__del__`` methods
> are bound.  Generators are also modified to use this slot, rather than
> ``tp_del``.  At the C level, a ``tp_finalize`` function is a normal
> function which will be called with a regular, alive object as its only
> argument.  It should not attempt to revive or collect the object.

Do you mean the opposite in the latest sentence?  ``tp_finalize`` can
do anything...


A bientôt,

Armin.


More information about the Python-Dev mailing list