[Python-Dev] Cyclic GC issues

Sun Oct 10 21:39:58 CEST 2004

Since the spring of 2003, I have been developing Crux, which is a computer
program for phylogenetic inferencing (bioinformatics) research.  In March
of 2004, I switched Crux to using Python from having used a different
embeddable interpreter.  For the most part, I have been very happy with
Python, but Python's garbage collector has been a major source of
frustration.

Below, I describe my trials and tribulations with Python's GC.  I also
offer some suggestions for changes to Python; if any of the proposed
changes receive favorable feedback, I am willing to help develop patches to
Python.  Naturally, if I am somehow abusing Python, and there are better
ways to do things, I'd be happy to hear how to improve Crux.

The important aspect of Crux is that it deals with trees.  These trees are
unrooted (there is no up or down), and multifurcating (nodes can have an
arbitrary number of neighboring nodes).  Thus, the trees are
self-referential, and without the cyclic GC capabilities of Python, there
would be little hope of making these trees integrate well with Python.
Following is a diagram that illustrates the linkage between various objects
for a simple tree.  Crux exposes all of the components of the trees as
Python objects.  All lines in the diagram represent bi-directional
references (except for the T-->N line).  Every object refers to the tree
object; those lines are left out in order to reduce clutter.

  T: Tree       N                     N
  N: Node        \                   /
  E: Edge         R                 R
  R: Ring          \               /
                    E             E
                     \           /
                      R---------R
                     / \       / \
                    /   \     /   \
                   |     \   /     |
                   |      \ /      |
                   |  T--->N       |
                   |       |       |
                    \      |      /
                     \     |     /
                      \----R----/
                           |
                           E
                           |
                           R
                           |
                           N

At the C (not Python object) level, the R-E-R construct is actually a set
of structures that are allocated/deallocated as a single unit.  Edges are
*always* connected to two rings, so there's no point in allocating these
separately.

Also, lone ring objects actually are rings with one member; they refer to
themselves (prev and next pointers).

That should be enough information to understand the problems I encountered.

1) I originally had lone rings refer to themselves (ob_refcnt started out
   at 3; 2 self-references and one reference held by the associated edge).
   This didn't work.  It appears that the cyclic GC does not actually
   calculate the number of live references to objects (references that are
   reachable by traversing all objects accessible from the root set);
   instead it assumes that if tp_clear() doesn't drop the reference count
   to a number that equals the number of references from live objects,
   there must still be references from live objects.  Unfortunately,
   visit()ing self does not work, so there is no way to convince Python
   that all references are from unreachable objects.

   Working around this in Crux requires a lot of extra reference counting
   complexity, because there are three different cases for reference
   counts, depending on how many members there are in a ring (1, 2, or 3+
   members).

2) This issue is really just a different manifestation of issue (1).

   At the C (not Python object) level, each node only actually stores a
   pointer to a single member of the associated ring.  Given a single ring
   member, it is possible to traverse the ring and reach all other ring
   members.  As mentioned in issue (1), the cyclic GC expects tp_traverse()
   to call visit() once for each reference held.  It is not enough for a
   node to visit() one ring member; it must visit() all ring members, in
   order for the GC to count how many references are from unreachable
   objects, versus reachable from the root set.

   In summary, issues (1) and (2) are due to how the cyclic GC does the
   "marking" phase of mark/sweep GC.  My expectation of mark/sweep GC is
   that it should be sufficient to assure that all objects reachable from
   the root set are visit()ed at least once; it should not be important how
   many times each unreachable object is visit()ed.

   I don't have a deep enough understanding of the Python interpreter to
   give a detailed suggestion for improvement.  I suspect that determining
   the root set is not an easy operation; if this is the case, then I think
   we're stuck with the current design.  If determining the root set *is*
   easy (or even possible), then I would suggest using a standard
   mark/sweep collector, where unreachable objects are scheduled for
   destruction.  tp_traverse(), tp_clear(), and tp_dealloc() would retain
   the same structure; the only difference would be the logic that
   determines which objects can be destroyed.

3) A strange thing can happen when tp_clear() is called.  Suppose that an
   edge object is being cleared, and it drops its references to the
   associated rings.  If ob_refcnt of one of the rings drops to 0 as a
   result, Python will tp_dealloc() the ring *right* *now*, without ever
   calling tp_clear() on the ring.  That means that I have to keep track of
   whether tp_clear() has been called on objects, if it is at all important
   that tp_clear() be called, so that I can manually do so in tp_dealloc(),
   if necessary.  It is in my opinion reasonable to have cleanup code in
   tp_clear(), with the assumption that it will be called precisely once,
   but Python makes me do extra work to make sure that this happens.

   This should be pretty easy to change.  A single bit per object is needed
   to keep track of whether tp_clear() has been called.  I think this only
   needs to be done for object types that support cyclic GC.

4) There is no way to control the order in which objects are
   tp_dealloc()ed.  This is a problem for the R-E-R construct, since at a
   low level, these objects are always linked together.  What I would like
   to do is defer tp_dealloc() on the edge until after both rings have been
   deleted.  Instead, I am forced to use a reference-counted deletion
   function.

   Not calling self->ob_type->tp_free() on the edge in tp_dealloc() until
   later is not a reasonable option, because this defers deletion of the
   edge until a later round of garbage collection.

   This could be addressed in the Python interpreter by paying heed to the
   return value of tp_dealloc().  If the return value is non-zero, move the
   object to the end of the list of objects to be destroyed, so that
   destruction is tried later.  This allows the module to impose its own
   destruction ordering.

I look forward to feedback.

Thank you,
Jason Evans