[Python-Dev] PEP 442: Safe object finalization
Antoine Pitrou
solipsis at pitrou.net
Sat May 18 15:45:52 CEST 2013
Hi Armin,
On Sat, 18 May 2013 15:24:08 +0200
Armin Rigo <arigo at tunes.org> wrote:
> Hi Antoine,
>
> On Sat, May 18, 2013 at 10:59 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> > Cyclic isolate (CI)
> > A reference cycle in which no object is referenced from outside the
> > cycle *and* whose objects are still in a usable, non-broken state:
> > they can access each other from their respective finalizers.
>
> Does this definition include more complicated cases? For example:
>
> A -> B -> A and A -> C -> A
>
> Neither cycle is isolated. If there is no reference from outside,
> then the set of all three objects is isolated, but isn't strictly a
> cycle. I think the term is "strongly connected component".
Yes, I should fix this definition to be more exact.
> > 1. Weakrefs to CI objects are cleared, and their callbacks called. At
> > this point, the objects are still safe to use.
> >
> > 2. **The finalizers of all CI objects are called.**
>
> You need to be very careful about what each call to a finalizer can do
> to the object graph. It may already be what you're doing, but the
> most careful solution is to collect in "1." the complete list of
> objects with finalizers that are in cycles; then incref them all; then
> call the finalizer of each of them; then decref them all. Such a
> solution gives new cases to think about, which are slightly unexpected
> for CPython's model: for example, if you have a cycle A -> B -> A,
> let's say the GC calls A.__del__ first; it might cause it to store a
> reference to B somewhere else, e.g. in some global; but then the GC
> calls B.__del__ anyway. This is probably fine but should be
> considered.
Yes, I know this is possible. My opinion is that it is fine to call B's
finalizer anyway. Calling all finalizers regardless of interim changes
in the object graph also makes things a bit more deterministic:
otherwise, which finalizers are called would depend on the call order,
which is undefined.
> > 3. **The CI is traversed again to determine if it is still isolated.
>
> How is this done? I don't see a clear way to determine it by looking
> only at the objects in the CI, given that arbitrary modifications of
> the object graph may have occurred.
The same way a generation is traversed, but restricted to the CI.
First the gc_refs field of each CI object is initialized to its
ob_refcnt (again).
Then, tp_traverse is called on each CI object, and each visited
CI object has its gc_refs decremented. This substracts CI-internal
references from the gc_refs fields.
At the end of the traversal, if all CI objects have their gc_refs equal
to 0, then the CI has no external reference to it and can be cleared.
If at least one CI object has non-zero gc_refs, the CI cannot be
cleared.
> Alternatively,
> this might be done immediately: in the point "3." above we can forget
> everything we found so far, and redo the tracking on all objects (this
> time ignoring finalizers that were already called).
This would also be more costly, performance-wise. A CI should
generally be quite small, but a whole generation is arbitrary big.
> > Type objects get a new ``tp_finalize`` slot to which ``__del__`` methods
> > are bound. Generators are also modified to use this slot, rather than
> > ``tp_del``. At the C level, a ``tp_finalize`` function is a normal
> > function which will be called with a regular, alive object as its only
> > argument. It should not attempt to revive or collect the object.
>
> Do you mean the opposite in the latest sentence? ``tp_finalize`` can
> do anything...
Not exactly, but I worded it poorly. What I meant is that the C code in
tp_finalize shouldn't *manually* revive the object, since it is called
with an object with a strictly positive refcount.
Regards
Antoine.
More information about the Python-Dev
mailing list