
On 4/15/07, Talin <talin@acm.org> wrote:
Greg Ewing wrote:
2) Objects owned by a thread
Python code creates and destroys temporary objects at a high rate -- stack frames, argument tuples, intermediate results, etc. If the code is executed by a thread, those objects are rarely if ever seen outside of that thread. It would be beneficial if refcount operations on such objects could be carried out by the thread that created them without locking.
While we are on the topic of reference counting, I'd like to direct your attention to Recycler, an IBM research project:
"The Recycler is a concurrent multiprocessor garbage collector with extremely low pause times (maximum of 6 milliseconds over eight benchmarks) while remaining competitive with the best throughput- oriented collectors in end-to-end execution times. This paper describes the overall architecture of the Recycler, including its use of reference counting and concurrent cycle collection, and presents extensive measurements of the system comparing it to a parallel, stop-the-world mark-and-sweep collector."
There are a bunch of research papers describing Recycler which can be found at the following link:
http://www.research.ibm.com/people/d/dfb/publications.html
I'd start with the papers entitled "Java without the Coffee Breaks: A Non-intrusive Multiprocessor Garbage Collector" and "Concurrent Cycle Collection in Reference Counted Systems".
Let me describe a bit about how the Recycler works and how it relates to what you've proposed.
The basic idea is that for each thread, there is a set of thread local data (TLD) that contains a pair of "refcount buffers", one buffer for increfs and one buffer for decrefs. Each refcount buffer is a flat array of pointers which starts empty and gradually fills up.
The "incref" operation does not actually touch the reference count field of the object. Instead, an "incref" appends a pointer to the object to the end of the incref buffer for that thread. Similarly, a decref operation appends a pointer to the object to the decref buffer. Since the refcount buffers are thread-local, there is no need for locking or synchronization.
When one of the buffers gets full, both buffers are swapped out for new ones, and the old buffers are placed on a queue which is processed by the collector thread. The collector thread is the only thread which is allowed to actually touch the reference counts of the individual objects, and its the only thread which is allowed to delete objects.
Processing the buffers is relatively simple: First, the incref buffer is processed. The collector thread scans through each pointer in the buffer, and increments the refcount of each object. Then the decref buffer is processed in a similar way, decrementing the refcount.
However, it also needs to process the buffers for the other threads before the memory can be reclaimed. Recycler defines a series of "epochs" (i.e. intervals between collections). Within a refcount buffer, each epoch is represented as a contiguous range of values within the array -- all of the increfs and decrefs which occurred during that epoch. An auxiliary array records the high water mark for each epoch.
Huh, interesting idea. I downloaded the papers and plan to give them a read. The one isssue I can see with this is that because of these epochs and using a buffer instead of actually manipulating the refcount means a delay. And I know some people love the (mostly) instantaneous garbage collection when the refcount should be at 0. Anyway, I will give the paper a read before I make any more ignorant statements about the design. =) -Brett