
On Tue, Dec 14, 2021 at 10:23 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
Most of the work toward interpreter isolation and a per-interpreter GIL involves moving static global variables to _PyRuntimeState or PyInterpreterState (or module state). Through the effort of quite a few people, we've made good progress. However, many globals still remain, with the majority being objects and most of those being static strings (e.g. _Py_Identifier), static types (incl. exceptions), and singletons.
On top of that, a number of those objects are exposed in the public C-API and even in the limited API. :( Dealing with this specifically is probably the trickiest thing I've had to work through in this project.
There is one solution that would help both of the above in a nice way: "immortal" objects.
The idea of objects that never get deallocated isn't new and has been explored here several times. Not that long ago I tried it out by setting the refcount really high. That worked. Around the same time Eddie Elizondo at Facebook did something similar but modified Py_INCREF() and Py_DECREF() to keep the refcount from changing. Our solutions were similar but with different goals in mind. (Facebook wants to avoid copy-on-write in their pre-fork model.)
A while back I concluded that neither approach would work for us. The approach I had taken would have significant cache performance penalties in a per-interpreter GIL world. The approach that modifies Py_INCREF() has a significant performance penalty due to the extra branch on such a frequent operation.
Recently I've come back to the idea of immortal objects because it's much simpler than the alternate (working) solution I found. So how do we get around that performance penalty? Let's say it makes CPython 5% slower. We have some options:
* live with the full penalty * make other changes to reduce the penalty to a more acceptable threshold than 5% * eliminate the penalty (e.g. claw back 5% elsewhere) * abandon all hope
Mark Shannon suggested to me some things we can do. Also, from a recent conversation with Dino Viehland it sounds like Eddie was able to reach performance-neutral with a few techniques. So here are some things we can do to reduce or eliminate that penalty:
* reduce refcount operations on high-activity objects (e.g. None, True, False) * reduce refcount operations in general * walk the heap at the end of runtime initialization and mark all objects as immortal * mark all global objects as immortal (statics or in _PyRuntimeState; for PyInterpreterState not needed)
What do you think? Does this sound realistic? Are there additional things we can do to counter that penalty?
There's also the concern of memory usage if these immortal objects are never collected. But *which *objects are immortal? You only listed None, True, and False. Otherwise assume/remember I'm management and provide a list and/or link of what would get marked as immortal so we can have an idea of the memory impact.