garbage collection / cyclic references

John Nagle nagle at animats.com
Sat Mar 21 14:04:42 EDT 2009


Aaron Brady wrote:
> Hello,
> 
> I was reading and Googling about garbage collection, reference
> counting, and the problem of cyclic references.
> 
> Python's garbage collection module claims to be able to detect and
> break cyclic garbage.  Some other languages merely prohibit it.  Is
> this the place to ask about its technique?
> 
> I understand that the disadvantage is a non-deterministic order of
> deletion/finalization.  

     Garbage collection and destructors or "finalizers" don't play well
together.  It's a fundamental problem.  Calling finalizers from the
garbage collector is painful.  It introduces concurrency where the
user may not have expected it.  Consider what happens if a finalizer
tries to lock something.  What if GC runs while that lock is locked?
This can create a deadlock situation.  Calling finalizers from the
garbage collector can result in intermittent, very hard to find bugs.

     C++ takes destructors seriously; objects are supposed to be destructed
exactly once, and if they're of "auto" scope (a local object in the
Python sense) they will reliably be cleaned up at block exit.
Microsoft's "Managed C++" broke those rules; in Managed C++,
destructors can be called more than once.  (Look up "re-animation"
in Microsoft Managed C++ literature.  It's not pretty.)

     Python actually has reference counting backed up by garbage collection.
Most objects are destroyed as soon as all references to them disappear.
Garbage collection is only needed to deal with cycles.

     Python has "weak references", which won't keep an object around
once all the regular references are deleted.  These are useful in
some situations.  In a tree, for example, pointers towards the leaves
should be strong pointers, while back-pointers towards the root should
be weak pointers.

    I once modified BeautifulSoup, the HTML parser, to use weak pointers
that way.  BeautifulSoup trees are big and don't go away immediately when
no longer needed, because they have backpointers.  They hang around until
the next GC cycle.  With the version that uses weak pointers, they go away
as soon as they're no longer needed.  We've found this useful in a web
crawler; the data space used drops and actual GC runs are no longer
necessary.

    Personally, I'd argue that the right answer is to prohibit cycles of
strong pointers.  That should be considered a programming error, and
detected at run time, at least by debugging tools.  With weak pointers,
you don't really need cycles of strong pointers.

    The advantage of this is a clean order of destruction.  This is useful
in window widget systems, where you have objects with pointers going in many
directions, yet object destruction has substantial side effects.

    Python originally had only reference counting, and didn't have weak pointers.
If weak pointers had gone in before the garbage collector, Python might have
gone in this direction.

				John Nagle



More information about the Python-list mailing list