There isn't really any contention for these memory locations in CPython as it stands because only one interpreter thread can run at a time. The only time a cache handoff is needed is during a thread switch when the new thread is scheduled on a different core, which is pretty rare (at CPU timescales). Adding checks to every incref/decref would probably cost more time than it would save. Something that might help performance a bit, and wouldn't hurt it, would be to drop explicit calls of Py_{INC,DEC}REF(Py_{None,False,True,...}), such as the ones in Py_RETURN_{NONE,FALSE,TRUE}, making these objects' refcounts into freefloating meaningless values. The refcounts would be initially set to a value far from zero, and on the rare occasions that they hit zero, the dealloc functions would just set them back to the initial value. Whether this would save enough time to be worth it, I don't know. (To avoid signed wraparound undefined behavior, you'd have to either change the refcount type from ssize_t to size_t, or else keep the DECREF calls and set the initial value to something like PY_SSIZE_T_MAX/2.)