
Most of the work toward interpreter isolation and a per-interpreter GIL involves moving static global variables to _PyRuntimeState or PyInterpreterState (or module state). Through the effort of quite a few people, we've made good progress. However, many globals still remain, with the majority being objects and most of those being static strings (e.g. _Py_Identifier), static types (incl. exceptions), and singletons. On top of that, a number of those objects are exposed in the public C-API and even in the limited API. :( Dealing with this specifically is probably the trickiest thing I've had to work through in this project. There is one solution that would help both of the above in a nice way: "immortal" objects. The idea of objects that never get deallocated isn't new and has been explored here several times. Not that long ago I tried it out by setting the refcount really high. That worked. Around the same time Eddie Elizondo at Facebook did something similar but modified Py_INCREF() and Py_DECREF() to keep the refcount from changing. Our solutions were similar but with different goals in mind. (Facebook wants to avoid copy-on-write in their pre-fork model.) A while back I concluded that neither approach would work for us. The approach I had taken would have significant cache performance penalties in a per-interpreter GIL world. The approach that modifies Py_INCREF() has a significant performance penalty due to the extra branch on such a frequent operation. Recently I've come back to the idea of immortal objects because it's much simpler than the alternate (working) solution I found. So how do we get around that performance penalty? Let's say it makes CPython 5% slower. We have some options: * live with the full penalty * make other changes to reduce the penalty to a more acceptable threshold than 5% * eliminate the penalty (e.g. claw back 5% elsewhere) * abandon all hope Mark Shannon suggested to me some things we can do. Also, from a recent conversation with Dino Viehland it sounds like Eddie was able to reach performance-neutral with a few techniques. So here are some things we can do to reduce or eliminate that penalty: * reduce refcount operations on high-activity objects (e.g. None, True, False) * reduce refcount operations in general * walk the heap at the end of runtime initialization and mark all objects as immortal * mark all global objects as immortal (statics or in _PyRuntimeState; for PyInterpreterState not needed) What do you think? Does this sound realistic? Are there additional things we can do to counter that penalty? -eric