[Python-Dev] CPython optimization: storing reference counters outside of objects

Tue May 24 13:31:38 CEST 2011

On Sun, May 22, 2011 at 1:57 AM, Artur Siekielski
<artur.siekielski at gmail.com> wrote:
> Hi.
> The problem with reference counters is that they are very often
> incremented/decremented, even for read-only algorithms (like traversal
> of a list). It has two drawbacks:
> 1. CPU cache lines (64 bytes on X86) containing a beginning of a
> PyObject are very often invalidated, resulting in loosing many chances
> to use the CPU caches

Not sure what scenario exactly are you discussing here, but storing
reference counts outside of objects has (at least on a single
processor) worse cache locality than inside objects.

>
> However the drawback is that such design introduces a new level of
> indirection which is a pointer inside a PyObject instead of a direct
> value. Also it seems that the "block" with refcounts would have to be
> a non-trivial data structure.

That would almost certainly be slower for most use cases, except for
the copy-on-write fork. I guess recycler papers might be an
interesting read:
http://www.research.ibm.com/people/d/dfb/recycler.html

This is the best reference-counting GC I'm aware of.

>
> I'm not a compiler/profiling expert so the main question is if such
> design can work, and maybe someone was thinking about something
> similar? And if CPython was profiled for CPU cache usage?

CPython was not designed for CPU cache usage as far as I'm aware.

>From my (heavily biased) point of view, PyPy is a way better platform
to perform such experiments (and PyPy has been profiled for CPU cache
usage). The main advantage is that you can code your GC without the
need to modify the interpreter. On the other hand you obviously don't
get benefits on CPython, but maybe it's worth experimenting.

Cheers,
fijal