[Python-Dev] Making weakref callbacks safe in cyclic gc

Mon Nov 17 21:45:23 EST 2003

[Greg Ewing]
> How often does a finalizer really *need* access to the entire object
> that triggered the finalization, and not just some part of its state?
>
> I remember reading once about the finalization scheme used in a
> particular Smalltalk implementation (I think it was ParcPlace) in
> which an object requiring finalization registers another object to be
> notified after it has died.
>
> This seems to be more or less equivalent to what we have with weakref
> callbacks. It might be worth studying how they deal with reference
> cycles in their system, since the same solution may well apply to us.

It appears that most (not all) Smalltalks pass a shallow copy of "self" to
the method registered for finalization, and the original self is truly
unreachable then.

If you've got time for research, go for it.  My experience is that you
generally can't get answers to such obscure questions without studying
source code, and then the precise answers depend on accidental
implementation details.  Takes a long time, and I don't have it.

Here are "The Rules" for Dolphin Smalltalk, which has "real" finalization
and weak references.  Good luck <wink>:

The Rules

The co-ordination and initiation of Finalization and the murdering of Weak
References are the responsibility of the memory manager, and are performed
during a garbage collection (GC) cycle, according to the following rules
(you may want to skip this advanced topic):

Any objects which are directly reachable down a chain of strong references
from the "roots of the world" will survive the GC, and will NOT be queued
for finalization.

Any objects which are NOT directly reachable by following a chain of strong
references from one of the roots of the world, are candidates for
finalization during a particular GC cycle.

Any weakly referencing objects which contain finalization candidates
identified as above, are candidates for a bereavement notification following
the GC cycle, and will have their pointers to those candidates changed to
pointers to the corpse object during this GC cycle, regardless of whether
those objects are actually queued for finalization during this GC cycle.

Any weakling which has suffered one or more bereavements during a GC cycle
which is also a member of a class marked with the mourning special behaviour
bit (termed a mourning weakling), will receive an #elementsExpired: message
telling them how many of such losses the garbage collector inflicted on
them.

A bereavement notification candidate will only actually be queued for such a
notification if it is a member of a class bearing the mourning special
behaviour mark (applied by sending the class #makeMourner).

Mourning weaklings queued for bereavement notifications will receive an
#elementsExpired: message before any of the objects they previously
referenced has actually been finalized. This is ordering is necessary in
order that when objects are queued for finalization, they do not have any
non-circular references, strong or weak, because a pre-condition for
finalization is that an object must be about to expire.

A mouring weakling which has suffered bereavements during a GC cyle, but
which would otherwise be garbage collected itself, is rescued until after it
has been sent an #elementsExpired: message. If such object still have no
references after processing the #elementsExpired: message, then they will be
garbage collected as normal.

A finalization candidate will only actually be queued for finalization if it
bears the finalization mark (applied by sending #beFinalizable).

Should a finalization candidate contain other finalizable objects, then even
if those contained finalizable objects are only strongly referenced from the
original finalization candidate, then they will not be finalized during the
current GC cycle, but will instead survive until at least the completion of
the containers #finalize (and probably until the next full GC cycle is
complete, should they be circularly referenced). This guarantees that when
an object is finalized, any objects which it "owns" (directly or indirectly)
will not yet have been finalized, and should therefore be in a valid state.

Where a finalizable object, call it A, references another finalizable
object, call it B, then B is guaranteed to be finalized before A. Indeed A
cannot be finalized until B has been finalized.

Where a circular reference exists between two finalizable objects, then the
order in which those objects are actually finalized is undefined (though
they will not be finalized in the same cycle). An example of where such a
situation might arise is where there is a finalizable parent which strongly
references all its children, and those children are finalizable and have a
back pointer to the parent. Although conceptually their is a parent-child
relationship, there is no way for the memory manager to determine which
should be finalized first (indeed it is not necessarily clear). Where this
is the case, #finalize methods must coded defensively, and not depend on
ordering.

Any object in the finalization queue which is not actively being finalized
will have no other references in the image.

You may be wondering why these complex rules are necessary, why not just
finalize every candidate marked as requiring finalization? Well, the rules
are designed to ensure that objects queued for finalization remain valid
until their finalization is complete. If we simply queued every candidate
for finalization, then we could not guarantee that constituent objects had
not already been finalized. This would make coding #finalize methods
horribly complicated and confusing.

Bereavement notifications are not sent to all weaklings by default, because
the necessity of rescuing GC'able weak objects to receive the notification
can potentially extend the lifetime of large groups of weak objects
referenced by other weak objects (e.g. weak tree nodes) due to a "cascading
rescue" effect. Cascading rescues significantly degrade the operation of the
system because they may prevent garbage weaklings from being collected for
many many GC cycles.

The memory manager must ensure that an object does not receive a #finalize
message until there are no strong references to it (which are not circular),
and we need to take account of strong references from objects which are
queued for finalization in the same garbage collection cycle. Even if an
object to be finalized is only referenced from another object to be
finalized in the same cycle, we must delay its finalization until the next
cycle, so that parents are finalized before children, otherwise the parent
may not be in a valid state during its finalize. It is not acceptable to
have the order of finalization depend purely on the ordering the objects are
visited during garbage collection.

Where a finalizable object is circularly referenced (perhaps indirectly), we
must ensure that it can be garbage collected - so this precludes simply
marking any candidates for finalization, and then only actually finalising
those which are unreferenced, because this would mean that circularly
referencing finalizable objects (phew!) would never be garbage collected. In
fact it is possible that an indirect circular reference could exist between
two finalizable objects, and where this is the case there is no general
mechanism for deciding which to finalize first, since there is no notion of
ownership.

This complexity is probably one of the reasons that some other Smalltalks do
not support finalization of objects directly. They have only weak references
and implement finalization with it: Any object which is not directly
reachable through a strong pointer chain is garbage collected, and any weak
references are "nilled". The weakly referencing objects which suffer
bereavements, are informed, and it is up to them to perform finalization
actions on behalf of the objects that they have lost. This is typically
achieved by having a copy of the finalizable objects, and using them as
'executors'. This approach makes garbage collection simpler, but is
inefficient and requries more complex Smalltalk classes to support it.
Furthermore, it does not address the finalization ordering problem. If you
want to implement such finalization in Dolphin, you do so quite easily using
mourning weak objects, because the Dolphin facilities are a superset.