On Wed, May 6, 2020 at 1:14 PM Serhiy Storchaka <storchaka@gmail.com> wrote:
06.05.20 00:46, Victor Stinner пише:
> Subinterpreters and multiprocessing have basically the same speed on
> this benchmark.

It does not look like there are some advantages of subinterpreters
against multiprocessing.

There is not an implementation worthy of comparison at this point, no.  I don't believe meaningful conclusions of that comparative nature can be drawn from the current work.  We shouldn't be blocking any decision on reducing our existing tech debt around subinterpreters on a viable multi-core solution existing.  There are benchmarks I could propose that I predict would show a different result even today but I'm refraining because I believe such things to be a distraction.

I am wondering how much 3.9 will be slower than 3.8 in single-thread
single-interpreter mode after getting rid of all process-wide singletons
and caches (Py_None, Py_True, Py_NonImplemented. small integers,
strings, tuples, _Py_IDENTIFIER, _PyArg_Parser, etc). Not mentioning
breaking binary compatibility.

I'm not worried, because it won't happen in 3.9.  :)  Nobody is seriously proposing that that be done in that manner.

The existing example work Victor did here (thanks!) was a rapid prototype where the easiest approach to getting _something_ running parallel as a demo was just to disable a bunch of shared global things instead of also doing much larger work to make those per-interpreter.

That isn't how we'd likely ever actually land this kind of change.

Longer term we need to aim to get rid of process global state by moving that into per-interpreter state.  No matter what.  This isn't something only needed by subinterpreters.  Corralling everything into a per-interpreter state with proper initialization and finalization everywhere allows other nice things like multiple independent interpreters in a process.  Even sequentially (spin up, tear down, spin up, tear down, repeat...).  We cannot reliably do that today without side effects such as duplicate initializations and resulting resource leaks or worse.  Even if such per-interpreter state instead of per-process state isolation is never used for parallel execution, I still want to see it happen.

Python already loses out to Lua because of this.  Lua is easily embedded in a self-contained fashion.  CPython has never been.  This kind of work helps open up that world instead of relegating us to only single life-of-the-process long lived language VM uses that we can serve today.

-gps