[Python-ideas] Multi-core reference count garbage collection

Fri Jul 20 06:50:00 EDT 2018

In an earlier post, I defined BUFFERED multi-core reference count
garbage collection. The basic idea is that each worker process store
in buffers the need to do an INCR and DECR on the reference count for
a piece of memory with a given ID. There is then a garbage collection
process that does the right thing.
https://mail.python.org/pipermail/python-ideas/2018-July/052054.html

An important aside. I'm very grateful for all the comments, which are
very helpful, made to the first part. It's really nice to know that
mine is not a new idea, and about some of the features and quirks (or
optimisations if you like) in the CPython GC system.

Here, I describe the easy part of doing the right thing. And make a
start on the hard part.

EASY PART
The GC process is responsible for keeping the running totals of the
reference counts. It does this by processing the INCR and DECR buffers
that are passed to it. (Once these buffers are processed they can, of
course, be passed back to the system for reuse.)

Where to store the totals of the reference counts is a design
decision. The main constraint is that ONLY the GC process should be
changing these counts. The buffering is roughly equivalent to the
changes made by the worker processes being queued for the GC process
to perform. But buffering also allows the 'queue' to be reordered.
This can substantially reduce the number of system calls required to
coordinate the various processes. And the amount of time a process is
locked out from running.

We could, following CPython, store the reference counts as a system
field in the object being stored. This is probably most economical,
regarding use of memory. However, if the GC process has associated to
it some high-speed memory of its own, it will be quicker if possible
to store the reference count totals in that high-speed memory. And so
storing the counts separately will probably make better use of
process-specific high speed memory.

Either way the algorithm is easy.

HARD PART
The do-nothing garbage collection avoids the hard part, which is to
know when a piece of memory can be reclaimed. With single-core
reference counting, this would be when the count becomes zero. But is
the same true for multi-core garbage collection?

The answer is both YES and NO.

First for the NO. They buffered scheme described here is asychronous,
and in particular reordering. So there might be INCR operations
pending, that would take the global reference count from zero to one.
(And it can go the other way. The object might be inaccessible, but
not all of its DECR operations are processed yet.)

So, and this is important, we cannot rely on the reference count
totals owned by the GC process to tell is if there is or is not a
reference to the object in the worker threads.

Now for the YES. The operations for the processes can be put in a
definite temporal order (i.e. serialized). When this is done then
(assuming every process is behaving as it should) the following
statement is true:

Suppose at some point T in time the quantity
* the GC process reference count for the object with some ID
* plus any pending INCR operations in worker process buffers
* minus any similarly pending DECR operations
is zero. Call this the adjusted reference count.

If that is true at T, then after time T the object with the given ID
is inaccessible to the worker processes. And so the piece of memory
can be reclaimed.

We can, of course, compute the adjusted reference count by pausing all
working threads. In other words, provide and use a global interpreter
lock.

The remainder of the hard part is to get hold of adjusted reference
counts, without having to lock the worker threads.

As before, remarks and comments very welcome. The previous ones were
much appreciated.

-- 
Jonathan