Oops, I responded to Issue 36710 but I should have posted my comment here. Discussion is better done on this list since we are talking about an entirely new C-API.
===========================================================================
I think there are two questions to answer. First, do we want to support multiple runtimes per process? Second, if we do, what is the best way to do that? Some people would argue that multiple runtimes are not needed or are too hard to do. Maybe they are correct, I'm not sure. We should try to get a consensus on that first question.
If we do decide to do it, then we need to answer the second question. Passing a "context" argument around seems the best solution. That is how the Java JNI does it. It sounds like that's how Javascript VMs do it too. We don't need to get creative. Look at what other VMs do and copy the best idea.
If we do decide to do it, evolving the codebase and all extension modules is going to be a massive task. I would imagine that we can have a backwards compatible API layer that uses TSS. The layer that passes context explicitly would still have to maintain the TSS. There could be a build option that turns that backwards compatibility on or off. If off, you would gain some performance advantage because TSS does not have to be kept up-to-date.
My feeling right now that even though this is a massive job, it is the correct thing to do. CPUs continue to gain cores. Improving CPython's ability to do multi-threading and multi-processing should be a priority for CPython core developers.