In an earlier post, I defined BUFFERED multi-core reference count garbage collection. The basic idea is that each worker process store in buffers the need to do an INCR and DECR on the reference count for a piece of memory with a given ID. There is then a garbage collection process that does the right thing. https://mail.python.org/pipermail/python-ideas/2018-July/052054.html An important aside. I'm very grateful for all the comments, which are very helpful, made to the first part. It's really nice to know that mine is not a new idea, and about some of the features and quirks (or optimisations if you like) in the CPython GC system. Here, I describe the easy part of doing the right thing. And make a start on the hard part. EASY PART The GC process is responsible for keeping the running totals of the reference counts. It does this by processing the INCR and DECR buffers that are passed to it. (Once these buffers are processed they can, of course, be passed back to the system for reuse.) Where to store the totals of the reference counts is a design decision. The main constraint is that ONLY the GC process should be changing these counts. The buffering is roughly equivalent to the changes made by the worker processes being queued for the GC process to perform. But buffering also allows the 'queue' to be reordered. This can substantially reduce the number of system calls required to coordinate the various processes. And the amount of time a process is locked out from running. We could, following CPython, store the reference counts as a system field in the object being stored. This is probably most economical, regarding use of memory. However, if the GC process has associated to it some high-speed memory of its own, it will be quicker if possible to store the reference count totals in that high-speed memory. And so storing the counts separately will probably make better use of process-specific high speed memory. Either way the algorithm is easy. HARD PART The do-nothing garbage collection avoids the hard part, which is to know when a piece of memory can be reclaimed. With single-core reference counting, this would be when the count becomes zero. But is the same true for multi-core garbage collection? The answer is both YES and NO. First for the NO. They buffered scheme described here is asychronous, and in particular reordering. So there might be INCR operations pending, that would take the global reference count from zero to one. (And it can go the other way. The object might be inaccessible, but not all of its DECR operations are processed yet.) So, and this is important, we cannot rely on the reference count totals owned by the GC process to tell is if there is or is not a reference to the object in the worker threads. Now for the YES. The operations for the processes can be put in a definite temporal order (i.e. serialized). When this is done then (assuming every process is behaving as it should) the following statement is true: Suppose at some point T in time the quantity * the GC process reference count for the object with some ID * plus any pending INCR operations in worker process buffers * minus any similarly pending DECR operations is zero. Call this the adjusted reference count. If that is true at T, then after time T the object with the given ID is inaccessible to the worker processes. And so the piece of memory can be reclaimed. We can, of course, compute the adjusted reference count by pausing all working threads. In other words, provide and use a global interpreter lock. The remainder of the hard part is to get hold of adjusted reference counts, without having to lock the worker threads. As before, remarks and comments very welcome. The previous ones were much appreciated. -- Jonathan