[Python-ideas] Multi-core reference count garbage collection
Jonathan Fine
jfine2358 at gmail.com
Thu Jul 19 02:33:13 EDT 2018
Hi
Based on other people's work (including in particular talks by Larry
Hastings) and my own thinking, I've come up with a scheme for multi-core
reference count garbage collection. I think it works, and much or all of it
comes from others. It might even be in the literature. Of course, if it's
wrong then the debit goes to me.
I'll explain it in instalments. Please let me know what you think, as we go
along.
The basic ideas of reference counting garbage collection are:
1. Each allocated piece of memory is given an ID (which is often its
address in memory).
2. For each ID, the system keeps a count of how many references (to that
piece of memory).
3. The count starts at ONE.
4. Processes increment and decrement the count, to keep the count correct.
5. When the count reaches zero, the piece of memory is garbage collected.
6. The previous step might result in further reference count decrements.
The simplest possible garbage collection scheme is to do nothing (and let
the garbage build up). In other words, allocate on a stack, and never free
memory. This has very good performance, until memory is exhausted. It is
also conceptually useful. I'll call it the do-nothing garbage collection
scheme.
Suppose for definiteness we have a 6-core CPU, with 5 worker processes and
one garbage collection (GC) process. The worker processes will do as little
as possible, consistent with not running out of memory. To achieve this:
1. Each worker process with have two GC buffers, one for increment and the
other for decrement.
2. For increment, the process will append the ID to the increment buffer
(for the process). And similarly for decrement.
3. If the buffer to be appended to is full, the worker process will
(a) set a buffer-full flag
(b) pause until the buffer-full flag has been cleared
(c) and then do what it was previously blocked from doing
I call any such scheme BUFFERED multi-core reference count garbage
collection. The worker processes know nothing about how garbage collection
is managed. Instead, they pass over to the GC process sufficient
information to allow it to manage the garbage.
This is now a good place to pause. We've specified what it is that the
worker processes should do, regarding garbage collection. And we've passed
the problem on to the garbage collection process. If that process does
nothing, we have refactored the do-nothing garbage collection scheme.
Expect another instalment tomorrow.
with best regards
Jonathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180719/bfbf2f7f/attachment.html>
More information about the Python-ideas
mailing list