[python-committers] Re: Experimental isolated subinterpreters

7 May 2020

      Hi Mark,
Le jeu. 7 mai 2020 à 15:53, Mark Shannon <mark@hotpy.org> a écrit :
...
I say no. Why the sudden urgency?
My urgency was to be able to quickly see if per-interpreter GIL would
be doable for Python 3.9 or not. The answer is no: there is too much
work to be done to write a "correct" implementation. I'm talking about
fixing subinterpreters issues in the proper way, not the idea of a
special build.
...
I think that you are making too many assumptions about how
inter-interpreter communication is going to work, and about sharing of
objects.
I'm not sure what you mean here. My changes rely on the assumption
that objects are not shared between two interpreters.
I didn't work at all on the inter-interpreter communication.
...
On 06/05/2020 6:49 pm, Victor Stinner wrote:
...
It's a practical solution to be able to experiment quickly
per-interpreter GIL without having to fix all issues at once. For
example, it disables Unicode interned strings which is unsafe with
multiple interpreters running in parallel.
If the interning is done on a per-interpreter basis, then it is safe.
Sure. My idea was to quickly lists area of the code that should be
reworked to be "compatible" with subinterpreters. As I wrote, the long
term plan is to fix these issues. For example, I made small integer
singletons per interpreter. The idea would be the same for interned
strings. It's not hard to do it, but I didn't want to make this change
right now, we are now close to the 3.9 release.
I think that I will remove my changes from the future 3.9 branch and
only keep them in the master branch, to clarify that 3.9 is now out of
the scope.
...
...
I added this #ifdef to encourage other core developers work on this
project, and let early adopters test this experimental feature to give
us their feedback.
You say this is to let other core developers work on this, but has
anyone actually asked for these changes?
I do want these changes as Eric Snow. And another core dev also asked
me how to contribute to this project.
Most of the past changes cleaned up Python internals. For example,
properly release resources at exit. It fix old issues about
Py_Initialize()/Py_Finalize() called multiple times when Python is
embedded. It's not only about subinterpreters. See for example:
https://bugs.python.org/issue1635741
...
...
Currently, the special build changes:

Per-interpreter GIL
Store the current Python thread state in a TLS
Disable dict, frame, tuple and list free list
Disable type method cache
Disable pymalloc: force usage of libc malloc
Disable the GC in subinterpreters
_xxsubinterpreters.run_string() releases the GIL

These changes are going to have such a large impact on robustness and
performance as to make any comparisons meaningless.
Oh sure, my benchmark on per-interpreter GIL was run on the same
Python binary where all these caches were disabled.
I'm aware that "regular" Python has all these caches. But I didn't
care of the absolute timing. I only wanted to check if subinterpreters
actually "scales" with the number of CPUs. My PoC benchmark says that
yes, it does. CPU-bound workaround is faster in subinterpreters than
using threads. It also shows that subinterpeters have basically the
same speed than multiprocessing, which is an interesting data point.
As I wrote in my email, they are only temporary changes using #ifdef,
but I plan to fix all these issues (make the code compatible with
subinterpreters).
...
...
Most changes are easy to write, but some other changes are non
trivial. For example, I modified _PyThreadState_GET() and
_PyThreadState_Swap() to use a Thread Local Storage (TLS) to get and
set the current Python thread state.
Does that mean you are going to remove all the PyThreadState *tstate
parameters that have been added lately?
I don't plan to make tstate implicit again soon. I like to see where
Python states are coming from in functions. But it's an open question.
...
Rather than sprinking #ifdefs everywhere, could you continue
consolidating all "global" objects into a single data structure?
Most of my work in 3.9 was to move things from _PyRuntimeState to
PyInterpreterState.
Moving global variables into these structures is the simple solution,
but I am trying to find a way to avoid declaring all structures in
PyInterpreterState.
#include "pycore_interp.h" includes conditional variables which
includes <windows.h> on Windows. It also includes tons of things,
since PyInterpreterState became quite large.
Maybe there is a way to have "per-interpreter" variables, something
similar to thread local storage (TLS). But I'm not sure how the API
would look alike.
Victor
Night gathers, and now my watch begins. It shall not end until my death.