On 4/15/07, Talin <talin@acm.org> wrote:
Greg Ewing wrote:
> 2) Objects owned by a thread
> Python code creates and destroys temporary objects
> at a high rate -- stack frames, argument tuples,
> intermediate results, etc. If the code is executed
> by a thread, those objects are rarely if ever seen
> outside of that thread. It would be beneficial if
> refcount operations on such objects could be carried
> out by the thread that created them without locking.

While we are on the topic of reference counting, I'd like to direct your
attention to Recycler, an IBM research project:

"The Recycler is a concurrent multiprocessor garbage collector with
extremely low pause times (maximum of 6 milliseconds over eight
benchmarks) while remaining competitive with the best throughput-
oriented collectors in end-to-end execution times. This paper describes
the overall architecture of the Recycler, including its use of reference
counting and concurrent cycle collection, and presents extensive
measurements of the system comparing it to a parallel, stop-the-world
mark-and-sweep collector."

There are a bunch of research papers describing Recycler which can be
found at the following link:


I'd start with the papers entitled "Java without the Coffee Breaks: A
Non-intrusive Multiprocessor Garbage Collector" and "Concurrent Cycle
Collection in Reference Counted Systems".

Let me describe a bit about how the Recycler works and how it relates to
what you've proposed.

The basic idea is that for each thread, there is a set of thread local
data (TLD) that contains a pair of "refcount buffers", one buffer for
increfs and one buffer for decrefs. Each refcount buffer is a flat array
of pointers which starts empty and gradually fills up.

The "incref" operation does not actually touch the reference count field
of the object. Instead, an "incref" appends a pointer to the object to
the end of the incref buffer for that thread. Similarly, a decref
operation appends a pointer to the object to the decref buffer. Since
the refcount buffers are thread-local, there is no need for locking or

When one of the buffers gets full, both buffers are swapped out for new
ones, and the old buffers are placed on a queue which is processed by
the collector thread. The collector thread is the only thread which is
allowed to actually touch the reference counts of the individual
objects, and its the only thread which is allowed to delete objects.

Processing the buffers is relatively simple: First, the incref buffer is
processed. The collector thread scans through each pointer in the
buffer, and increments the refcount of each object. Then the decref
buffer is processed in a similar way, decrementing the refcount.

However, it also needs to process the buffers for the other threads
before the memory can be reclaimed. Recycler defines a series of
"epochs" (i.e. intervals between collections). Within a refcount buffer,
each epoch is represented as a contiguous range of values within the
array -- all of the increfs and decrefs which occurred during that
epoch. An auxiliary array records the high water mark for each epoch.

Huh, interesting idea.  I downloaded the papers and plan to give them a read.  The one isssue I can see with this is that because of these epochs and using a buffer instead of actually manipulating the refcount means a delay.  And I know some people love the (mostly) instantaneous garbage collection when the refcount should be at 0.

Anyway, I will give the paper a read before I make any more ignorant statements about the design.  =)