[Python-3000] Transitional GC?

Sun Sep 24 20:55:47 CEST 2006

I wonder if there is a way to create an API for extension modules that 
would allow a gradual phase-out of reference counting, towards a 'pure' GC.

(Let's leave aside the merits of reference counting vs. non-reference 
counting for another thread - please.)

Most of the discussion up to this point has assumed that there's a sharp 
line between the two GC schemes - in other words, once you switch over, 
you have to migrate every extension module all at once.

I've been wondering, however, if there isn't some way for both schemes 
to coexist within the same interpreter, for some transitional period. 
You would have some modules that use the RC API, while other modules 
would use the 'tracing' API. Modules could gradually be ported to the 
new API until there were none left, at which point you could throw the 
switch and remove the RC support entirely.

I'm assuming two things here:
   1) That such a transitional scheme would have to be as efficient (or 
nearly so) as the existing scheme in terms of memory and speed.
   2) That we're talking source-level compatibility only - there's no 
expectation that you would be able to link with modules compiled under 
the old API.

I see two basic approaches to this. The first is to have 
reference-counting modules live in a predominately trace-based world; 
The other is to allow tracing modules to live in a predominately 
reference-counted world.

The first approach is relatively straightforward - you simply add any 
object with a non-zero refcount to the root set. Objects whose refcounts 
fall to zero are not immediately deleted, but instead get placed into 
the youngest generation to be traced and collected.

The second approach requires that an object be able to manage refcounts 
via its trace function.

Consider what an extension module looks like under a tracing regime. 
Each extension class is required to provide a 'trace' function that 
iterates through all references held by an object.

The 'trace' function need not know the purpose of the trace - in other 
words, it need not know *why* the references are being iterated, its 
only concern is to provide access to each references. This is most 
easily accomplished by passing a callback function to the trace 
function. The trace function iterates through the object's references 
and calls the callback once for each one.

Because the extension modules doesn't know why the references are being 
traced, this gives us the freedom to redefine what a 'trace' means at 
various points in the transition.

So one scheme would allow a 'traced' object to exist in a 
reference-counted world by using the trace function to release 
references. When an object is destroyed, the trace function is called, 
and the callback releases the reference.

Dealing with mutation of references is trickier - there's a couple of 
approaches I've thought of by none are particularly efficient. I guess 
the traced object will have to call the old refcounting functions, but 
via macros which can be no-op'd later.

-- Talin