[Python-Dev] PEP 442: Safe object finalization

Sat May 18 15:45:52 CEST 2013

Hi Armin,

On Sat, 18 May 2013 15:24:08 +0200
Armin Rigo <arigo at tunes.org> wrote:
> Hi Antoine,
> 
> On Sat, May 18, 2013 at 10:59 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> > Cyclic isolate (CI)
> >     A reference cycle in which no object is referenced from outside the
> >     cycle *and* whose objects are still in a usable, non-broken state:
> >     they can access each other from their respective finalizers.
> 
> Does this definition include more complicated cases?  For example:
> 
>     A -> B -> A    and   A -> C -> A
> 
> Neither cycle is isolated.  If there is no reference from outside,
> then the set of all three objects is isolated, but isn't strictly a
> cycle.  I think the term is "strongly connected component".

Yes, I should fix this definition to be more exact.

> > 1. Weakrefs to CI objects are cleared, and their callbacks called. At
> >    this point, the objects are still safe to use.
> >
> > 2. **The finalizers of all CI objects are called.**
> 
> You need to be very careful about what each call to a finalizer can do
> to the object graph.  It may already be what you're doing, but the
> most careful solution is to collect in "1." the complete list of
> objects with finalizers that are in cycles; then incref them all; then
> call the finalizer of each of them; then decref them all.  Such a
> solution gives new cases to think about, which are slightly unexpected
> for CPython's model: for example, if you have a cycle A -> B -> A,
> let's say the GC calls A.__del__ first; it might cause it to store a
> reference to B somewhere else, e.g. in some global; but then the GC
> calls B.__del__ anyway.  This is probably fine but should be
> considered.

Yes, I know this is possible. My opinion is that it is fine to call B's
finalizer anyway. Calling all finalizers regardless of interim changes
in the object graph also makes things a bit more deterministic:
otherwise, which finalizers are called would depend on the call order,
which is undefined.

> > 3. **The CI is traversed again to determine if it is still isolated.
> 
> How is this done?  I don't see a clear way to determine it by looking
> only at the objects in the CI, given that arbitrary modifications of
> the object graph may have occurred.

The same way a generation is traversed, but restricted to the CI.

First the gc_refs field of each CI object is initialized to its
ob_refcnt (again).

Then, tp_traverse is called on each CI object, and each visited
CI object has its gc_refs decremented. This substracts CI-internal
references from the gc_refs fields.

At the end of the traversal, if all CI objects have their gc_refs equal
to 0, then the CI has no external reference to it and can be cleared.
If at least one CI object has non-zero gc_refs, the CI cannot be
cleared.

> Alternatively,
> this might be done immediately: in the point "3." above we can forget
> everything we found so far, and redo the tracking on all objects (this
> time ignoring finalizers that were already called).

This would also be more costly, performance-wise. A CI should
generally be quite small, but a whole generation is arbitrary big.

> > Type objects get a new ``tp_finalize`` slot to which ``__del__`` methods
> > are bound.  Generators are also modified to use this slot, rather than
> > ``tp_del``.  At the C level, a ``tp_finalize`` function is a normal
> > function which will be called with a regular, alive object as its only
> > argument.  It should not attempt to revive or collect the object.
> 
> Do you mean the opposite in the latest sentence?  ``tp_finalize`` can
> do anything...

Not exactly, but I worded it poorly. What I meant is that the C code in
tp_finalize shouldn't *manually* revive the object, since it is called
with an object with a strictly positive refcount.

Regards

Antoine.