[Python-3000] Transitional GC?
talin at acm.org
Sun Sep 24 20:55:47 CEST 2006
I wonder if there is a way to create an API for extension modules that
would allow a gradual phase-out of reference counting, towards a 'pure' GC.
(Let's leave aside the merits of reference counting vs. non-reference
counting for another thread - please.)
Most of the discussion up to this point has assumed that there's a sharp
line between the two GC schemes - in other words, once you switch over,
you have to migrate every extension module all at once.
I've been wondering, however, if there isn't some way for both schemes
to coexist within the same interpreter, for some transitional period.
You would have some modules that use the RC API, while other modules
would use the 'tracing' API. Modules could gradually be ported to the
new API until there were none left, at which point you could throw the
switch and remove the RC support entirely.
I'm assuming two things here:
1) That such a transitional scheme would have to be as efficient (or
nearly so) as the existing scheme in terms of memory and speed.
2) That we're talking source-level compatibility only - there's no
expectation that you would be able to link with modules compiled under
the old API.
I see two basic approaches to this. The first is to have
reference-counting modules live in a predominately trace-based world;
The other is to allow tracing modules to live in a predominately
The first approach is relatively straightforward - you simply add any
object with a non-zero refcount to the root set. Objects whose refcounts
fall to zero are not immediately deleted, but instead get placed into
the youngest generation to be traced and collected.
The second approach requires that an object be able to manage refcounts
via its trace function.
Consider what an extension module looks like under a tracing regime.
Each extension class is required to provide a 'trace' function that
iterates through all references held by an object.
The 'trace' function need not know the purpose of the trace - in other
words, it need not know *why* the references are being iterated, its
only concern is to provide access to each references. This is most
easily accomplished by passing a callback function to the trace
function. The trace function iterates through the object's references
and calls the callback once for each one.
Because the extension modules doesn't know why the references are being
traced, this gives us the freedom to redefine what a 'trace' means at
various points in the transition.
So one scheme would allow a 'traced' object to exist in a
reference-counted world by using the trace function to release
references. When an object is destroyed, the trace function is called,
and the callback releases the reference.
Dealing with mutation of references is trickier - there's a couple of
approaches I've thought of by none are particularly efficient. I guess
the traced object will have to call the old refcounting functions, but
via macros which can be no-op'd later.
More information about the Python-3000