[Python-Dev] "immortal" objects and how they would help per-interpreter GIL

Dec. 14, 2021

      Most of the work toward interpreter isolation and a per-interpreter
GIL involves moving static global variables to _PyRuntimeState or
PyInterpreterState (or module state).  Through the effort of quite a
few people, we've made good progress.  However, many globals still
remain, with the majority being objects and most of those being static
strings (e.g. _Py_Identifier), static types (incl. exceptions), and
singletons.

On top of that, a number of those objects are exposed in the public
C-API and even in the limited API. :(  Dealing with this specifically
is probably the trickiest thing I've had to work through in this
project.

There is one solution that would help both of the above in a nice way:
"immortal" objects.

The idea of objects that never get deallocated isn't new and has been
explored here several times.  Not that long ago I tried it out by
setting the refcount really high.  That worked.  Around the same time
Eddie Elizondo at Facebook did something similar but modified
Py_INCREF() and Py_DECREF() to keep the refcount from changing.  Our
solutions were similar but with different goals in mind.  (Facebook
wants to avoid copy-on-write in their pre-fork model.)

A while back I concluded that neither approach would work for us.  The
approach I had taken would have significant cache performance
penalties in a per-interpreter GIL world.  The approach that modifies
Py_INCREF() has a significant performance penalty due to the extra
branch on such a frequent operation.

Recently I've come back to the idea of immortal objects because it's
much simpler than the alternate (working) solution I found.  So how do
we get around that performance penalty?  Let's say it makes CPython 5%
slower.  We have some options:

* live with the full penalty
* make other changes to reduce the penalty to a more acceptable
threshold than 5%
* eliminate the penalty (e.g. claw back 5% elsewhere)
* abandon all hope

Mark Shannon suggested to me some things we can do.  Also, from a
recent conversation with Dino Viehland it sounds like Eddie was able
to reach performance-neutral with a few techniques.  So here are some
things we can do to reduce or eliminate that penalty:

* reduce refcount operations on high-activity objects (e.g. None, True, False)
* reduce refcount operations in general
* walk the heap at the end of runtime initialization and mark all
objects as immortal
* mark all global objects as immortal (statics or in _PyRuntimeState;
for PyInterpreterState not needed)

What do you think?  Does this sound realistic?  Are there additional
things we can do to counter that penalty?

-eric

[Python-Dev] "immortal" objects and how they would help per-interpreter GIL

Eric Snow