# [Python-Dev] Cyclic GC issues

Jason Evans jasone at canonware.com
Sun Oct 10 21:39:58 CEST 2004

```Since the spring of 2003, I have been developing Crux, which is a computer
program for phylogenetic inferencing (bioinformatics) research.  In March
of 2004, I switched Crux to using Python from having used a different
embeddable interpreter.  For the most part, I have been very happy with
Python, but Python's garbage collector has been a major source of
frustration.

Below, I describe my trials and tribulations with Python's GC.  I also
offer some suggestions for changes to Python; if any of the proposed
changes receive favorable feedback, I am willing to help develop patches to
Python.  Naturally, if I am somehow abusing Python, and there are better
ways to do things, I'd be happy to hear how to improve Crux.

The important aspect of Crux is that it deals with trees.  These trees are
unrooted (there is no up or down), and multifurcating (nodes can have an
arbitrary number of neighboring nodes).  Thus, the trees are
self-referential, and without the cyclic GC capabilities of Python, there
would be little hope of making these trees integrate well with Python.
Following is a diagram that illustrates the linkage between various objects
for a simple tree.  Crux exposes all of the components of the trees as
Python objects.  All lines in the diagram represent bi-directional
references (except for the T-->N line).  Every object refers to the tree
object; those lines are left out in order to reduce clutter.

T: Tree       N                     N
N: Node        \                   /
E: Edge         R                 R
R: Ring          \               /
E             E
\           /
R---------R
/ \       / \
/   \     /   \
|     \   /     |
|      \ /      |
|  T--->N       |
|       |       |
\      |      /
\     |     /
\----R----/
|
E
|
R
|
N

At the C (not Python object) level, the R-E-R construct is actually a set
of structures that are allocated/deallocated as a single unit.  Edges are
*always* connected to two rings, so there's no point in allocating these
separately.

Also, lone ring objects actually are rings with one member; they refer to
themselves (prev and next pointers).

That should be enough information to understand the problems I encountered.

1) I originally had lone rings refer to themselves (ob_refcnt started out
at 3; 2 self-references and one reference held by the associated edge).
This didn't work.  It appears that the cyclic GC does not actually
calculate the number of live references to objects (references that are
reachable by traversing all objects accessible from the root set);
instead it assumes that if tp_clear() doesn't drop the reference count
to a number that equals the number of references from live objects,
there must still be references from live objects.  Unfortunately,
visit()ing self does not work, so there is no way to convince Python
that all references are from unreachable objects.

Working around this in Crux requires a lot of extra reference counting
complexity, because there are three different cases for reference
counts, depending on how many members there are in a ring (1, 2, or 3+
members).

2) This issue is really just a different manifestation of issue (1).

At the C (not Python object) level, each node only actually stores a
pointer to a single member of the associated ring.  Given a single ring
member, it is possible to traverse the ring and reach all other ring
members.  As mentioned in issue (1), the cyclic GC expects tp_traverse()
to call visit() once for each reference held.  It is not enough for a
node to visit() one ring member; it must visit() all ring members, in
order for the GC to count how many references are from unreachable
objects, versus reachable from the root set.

In summary, issues (1) and (2) are due to how the cyclic GC does the
"marking" phase of mark/sweep GC.  My expectation of mark/sweep GC is
that it should be sufficient to assure that all objects reachable from
the root set are visit()ed at least once; it should not be important how
many times each unreachable object is visit()ed.

I don't have a deep enough understanding of the Python interpreter to
give a detailed suggestion for improvement.  I suspect that determining
the root set is not an easy operation; if this is the case, then I think
we're stuck with the current design.  If determining the root set *is*
easy (or even possible), then I would suggest using a standard
mark/sweep collector, where unreachable objects are scheduled for
destruction.  tp_traverse(), tp_clear(), and tp_dealloc() would retain
the same structure; the only difference would be the logic that
determines which objects can be destroyed.

3) A strange thing can happen when tp_clear() is called.  Suppose that an
edge object is being cleared, and it drops its references to the
associated rings.  If ob_refcnt of one of the rings drops to 0 as a
result, Python will tp_dealloc() the ring *right* *now*, without ever
calling tp_clear() on the ring.  That means that I have to keep track of
whether tp_clear() has been called on objects, if it is at all important
that tp_clear() be called, so that I can manually do so in tp_dealloc(),
if necessary.  It is in my opinion reasonable to have cleanup code in
tp_clear(), with the assumption that it will be called precisely once,
but Python makes me do extra work to make sure that this happens.

This should be pretty easy to change.  A single bit per object is needed
to keep track of whether tp_clear() has been called.  I think this only
needs to be done for object types that support cyclic GC.

4) There is no way to control the order in which objects are
tp_dealloc()ed.  This is a problem for the R-E-R construct, since at a
low level, these objects are always linked together.  What I would like
to do is defer tp_dealloc() on the edge until after both rings have been
deleted.  Instead, I am forced to use a reference-counted deletion
function.

Not calling self->ob_type->tp_free() on the edge in tp_dealloc() until
later is not a reasonable option, because this defers deletion of the
edge until a later round of garbage collection.

This could be addressed in the Python interpreter by paying heed to the
return value of tp_dealloc().  If the return value is non-zero, move the
object to the end of the list of objects to be destroyed, so that
destruction is tried later.  This allows the module to impose its own
destruction ordering.

I look forward to feedback.

Thank you,
Jason Evans
```