
On 16/03/2020 6:21 pm, Victor Stinner wrote:
There were quick discussions about using thread local storage (TLS) to get and set the current Python thread state ("tstate"), instead of reading/setting an atomic variable (_PyRuntime.gilstate.tstate_current).
In fact, TLS already exists as "PyGILState" and PyGILState_GetThisThreadState() can already return the current thread state. But this API doesn't work currently with subinterpreters:
Just to be clear, this is what I mean by a thread local variable: https://godbolt.org/z/dpSo-Q
* https://bugs.python.org/issue10915 * https://bugs.python.org/issue15751
It's unclear to me if fixing this issue would require to add a lock, nor if it would make PyGILState_GetThisThreadState() or _PyThreadState_GET() slower.
It doesn't require a lock, and it is only two instructions (it's 5 instructions on Windows, but that's still cheap).
Moreover, currently, it's possible to have two Python thread state for the same native thread. That's needed for the transition from the main interpreter to new subinterpreter. Example: --- mainstate = PyThreadState_Get();
PyThreadState_Swap(NULL);
substate = Py_NewInterpreter(); r = PyRun_SimpleString(code); Py_EndInterpreter(substate);
PyThreadState_Swap(mainstate); ---
The visibility of a thread local variable is a strict superset of that of a function local variable. Anything you can do with a function local variable, you can do with a thread local variable.
where Py_NewInterpreter() creates a new Python thread state using PyThreadState_New() and sets it as the current Python thread state.
Maybe the subinterpreter API can evolve to never "attach" two different interpreters to the same thread: one Python thread state would only belong to a single native thread.
I would be interested to explore the option of using TLS everywhere, but first we need to solve all these tricky issues.
Again, so far, passing tstate explicitly is only done internally, so if we switch to another solution which doesn't require to pass tstate explicitly, we can do that without breaking the public C API.
--
I hope that you now have a better overview of the current state of the Python implementation and how it evolves in the last two years.
What worries me is the idea that passing the thread state around is the only way to isolate sub-interpreters. It isn't. Using thread-local variables is a better way. There may be even better approaches, but I'm not aware of them.
Victor
Le lun. 16 mars 2020 à 16:04, Victor Stinner <vstinner@python.org> a écrit :
Hi,
Changes on this scale merit a PEP and proper discussion, rather than being added piecemeal without proper review.
Last November, I asked explicitly on python-dev if we should "Pass the Python thread state to internal C functions": https://mail.python.org/archives/list/python-dev@python.org/thread/PQBGECVGV...
In short, the answer is yes.
There is no PEP but scatted documents. I wrote a short article to elaborate the context of this work: https://vstinner.github.io/cpython-pass-tstate.html
One motivation is to ease the implementation of subinterpreters (PEP 554). But PEP 554 describes more than public API than the implementation.
--
In the meanwhile, I modified "small integer singletons" to make them "per-interpreter". So tstate->interp is now used to get small integers in longobject.c.
I also opened a discussion on other singletons (None, True, False, ...): https://bugs.python.org/issue39511
The long-term goal is to be able to run multiple isolated interpreters in parallel.
Le lun. 16 mars 2020 à 15:16, Mark Shannon <mark@hotpy.org> a écrit :
There seems to be a proliferation of `PyThreadState *tstate` arguments being added to API and internal functions.
So far, tstate should only be passed to *internal* C functions. I don't think that the public C API has been modified to pass tstate.
These changes are listed under https://bugs.python.org/issue38644.
There was also https://bugs.python.org/issue36710
I think that these changes are misguided. The desired results can be achieved more reliably and more simply in other ways.
Would you mind to elaborate?
The changes add bulk to the C-API and may hurt performance.
Did you notice that in benchmarks? I would be curious to see the overhead.
These changes are also causing a lot of churn and merge conflicts (for me at least).
Sorry about that :-/ A lot of Python internals should be modified to implement subinterpreters.
Victor -- Night gathers, and now my watch begins. It shall not end until my death.