Re: Proliferation of tstate arguments.
Chris Angelico [mailto:rosuav@gmail.com]:
And by that logic, globals are even more capable. I don't understand your point. Isn't the purpose of the tstate parameters to avoid the problem of being unable to have multiple tstates within the same OS thread? I think I've missed something here.
The point is that because threads can't preempt themselves, this: /*1*/ { Py_something(other_tstate); } and this: /*2*/ { PyInterpreterState* old_tstate = tstate; tstate = other_state; Py_something(); tstate = old_tstate; } is effectively equivalent, provided tstate is thread-local. Both will work equally well from the hypothetical C callback that wants to use a different subinterpreter. That wouldn't be true if tstate was process-wide, because that would be a race condition, some other thread might change tstate. But if tstate is thread-local, there's no race condition. Obviously /*1*/ is cleaner code than /*2*/, but /*2*/ is perfectly sound all the same. regards, Anders
On Wed, 18 Mar 2020 13:35:16 +0000 Anders Munch <ajm@flonidan.dk> wrote:
Chris Angelico [mailto:rosuav@gmail.com]:
And by that logic, globals are even more capable. I don't understand your point. Isn't the purpose of the tstate parameters to avoid the problem of being unable to have multiple tstates within the same OS thread? I think I've missed something here.
The point is that because threads can't preempt themselves, this:
/*1*/ { Py_something(other_tstate); }
and this:
/*2*/ { PyInterpreterState* old_tstate = tstate; tstate = other_state; Py_something(); tstate = old_tstate; }
is effectively equivalent, provided tstate is thread-local. Both will work equally well from the hypothetical C callback that wants to use a different subinterpreter.
This example is mixing up the notion of interpreter state and thread state. Here is a more realistic example: struct UserData { PyInterpreterState* interp; }; // Callback given to a third-party C library void my_c_callback(void* opaque) { struct UserData* user_data = (struct UserData*) opaque; PyGILState_STATE gstate; gstate = PyInterpreter_GILStateEnsure(user_data->interp); // ... PyInterpreter_GILStateEnsure(user_data->interp, gstate); } Of course this implies the existence of a PyInterpreter_GILState API that currently doesn't exist. In this model, the "thread-local" thread state is a *per-interpreter* thread-local. A *process-wide* thread-local thread state cannot work if we want -- in some distant future -- to be able to run several subinterpreters concurrently. In any case, the amount of disagreement and/or misunderstanding in this discussion is a strong hint that it needs a PEP to hash things out and explain them clearly, IMHO. Regards Antoine.
Antoine Pitrou wrote:
In any case, the amount of disagreement and/or misunderstanding in this discussion is a strong hint that it needs a PEP to hash things out and explain them clearly, IMHO.
Agreed; a PEP (even if it's just informational) would go a long way in helping to clear up some misunderstandings. Perhaps something along the lines of "Supporting concurrent subinterpreters in the C-API" that provides both a high-level overview and low-level information on some of the different structures involved and any needed changes [1], such as to PyThreadState, PyInterpreterState, PyGILState_STATE, etc. Optimally, it would explain some changes made and needed changes to their public and private APIs, as well as to the struct members. I'd be happy to help with something like this (particularly in reviewing and providing feedback on any unclear parts), but my own current understanding is likely not strong enough to lead the efforts. In fact, I very likely am in the camp of having some significant misunderstandings about the low-level implementation details. My experience with subinterpreters [2] is mostly limited to attempting to help with some tricky bugs. I'd call it an "area of interest", but it's very far from an "area of expertise". ;-) In the meantime for anyone looking for a basic overview on thread state and interpreter state, there's some general information documented in https://docs.python.org/3.9/c-api/init.html#high-level-api. --- [1] - By "needed changes", I am specifically referring to any changes to the C-API or internals that were made or are still needed to support concurrent subinterpreters. [2] - I have much more experience in the existing areas of concurrency in the standard library, such as with asyncio and concurrent.futures. Although it's relevant, it's very different in terms to implementation details. Also, I'm substantially more interested in the Python parts of the stdlib rather than the C internals or extension modules, but I certainly have an interest in learning more. Regards, Kyle Stanley On Wed, Mar 18, 2020 at 11:53 AM Antoine Pitrou <solipsis@pitrou.net> wrote:
Chris Angelico [mailto:rosuav@gmail.com]:
And by that logic, globals are even more capable. I don't understand your point. Isn't the purpose of the tstate parameters to avoid the problem of being unable to have multiple tstates within the same OS thread? I
On Wed, 18 Mar 2020 13:35:16 +0000 Anders Munch <ajm@flonidan.dk> wrote: think I've
missed something here.
The point is that because threads can't preempt themselves, this:
/*1*/ { Py_something(other_tstate); }
and this:
/*2*/ { PyInterpreterState* old_tstate = tstate; tstate = other_state; Py_something(); tstate = old_tstate; }
is effectively equivalent, provided tstate is thread-local. Both will work equally well from the hypothetical C callback that wants to use a different subinterpreter.
This example is mixing up the notion of interpreter state and thread state.
Here is a more realistic example:
struct UserData { PyInterpreterState* interp; };
// Callback given to a third-party C library void my_c_callback(void* opaque) { struct UserData* user_data = (struct UserData*) opaque; PyGILState_STATE gstate; gstate = PyInterpreter_GILStateEnsure(user_data->interp);
// ...
PyInterpreter_GILStateEnsure(user_data->interp, gstate); }
Of course this implies the existence of a PyInterpreter_GILState API that currently doesn't exist. In this model, the "thread-local" thread state is a *per-interpreter* thread-local. A *process-wide* thread-local thread state cannot work if we want -- in some distant future -- to be able to run several subinterpreters concurrently.
In any case, the amount of disagreement and/or misunderstanding in this discussion is a strong hint that it needs a PEP to hash things out and explain them clearly, IMHO.
Regards
Antoine.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EKZJYO75... Code of Conduct: http://python.org/psf/codeofconduct/
Le jeu. 19 mars 2020 à 02:17, Kyle Stanley <aeros167@gmail.com> a écrit :
Agreed; a PEP (even if it's just informational) would go a long way in helping to clear up some misunderstandings.
I am still moving blindly in the darkness of CPython internals and so I don't feel comfortable to write a PEP which would mean that I know what I am doing :-) I fixed dozens of bugs in the Python finalization related to subinterpreters and daemon threads: https://bugs.python.org/issue33608#msg364670 I managed to make pending calls (Py_AddPendingCall) per-interpreter, but it was really painful. So far, I have no strong opinion about passing explicitly tstate vs "implicit" TLS (Thread Local Storage) variable. I mean, I don't feel that I have enough feedback to see which option is the best. All I can say is that passing explicitly tstate helps me to validate that the code is *correct*. It helps me to understand where tstate comes from and helps me to understand if tstate is valid or not. It wasn't an issue in Python 3.7 where most critical structures lives in _PyRuntime which is shared by all subinterpreters. It became a big deal in Python 3.9 where more and more critical structures are "per-interpreter" and becomes invalid during Python finalization. For example, in Python 3.8, PyThreadState structures of daemon threads are never freed. There are still too many functions which don't work properly with subinterpreters. There are still a few design issues that needs to be addressed. Most functions of the C API don't even specify in their documentation if they require the GIL to be held. One good example is Py_AddPendingCall(). The documentation says that it's safe to call it without holding the GIL. Except that right now, there is no reliable way to get the correct interpreter in this case (correct me if I'm wrong!). The function uses PyGILState_GetThisThreadState() which may return a Python thread state of the wrong interpreter :-( Again, the PyGILState API should be fixed to support subinterpreters. I'm not even sure that I discovered all borders of the problem. I only started my list of issues. I would prefer to continue to experiment passing tstate explicitly in internal C APIs until most blocker issues will be fixed. Once early work on running two subinterpreters in parallel will start working (one "GIL" per interpreter), I will be more open to reconsider using a TLS variable. We are not so far from being able to *experiment* one "GIL" per interpreter. I succeeded to move the first half of _PyRuntimeState.ceval into a new PyInterpreterState.ceval: pending_calls, eval_breaker and tracing_possible. Moving eval_breaker was tricky, since it's tidely coupled to the GIL which currently remains shared by all interpreters. By the way, I took the opportunity to optimized the bytecode evaluation loop (ceval), eval_breaker was not properly set in multithreaded applications: "Inefficient signal handling in multithreaded applications" https://bugs.python.org/issue40010 The main interpreter benefits of this optimization as well, not only subinterpreters ;-) Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Victor Stinner writes:
Le jeu. 19 mars 2020 à 02:17, Kyle Stanley <aeros167@gmail.com> a écrit :
Agreed; a PEP (even if it's just informational) would go a long way in helping to clear up some misunderstandings.
I am still moving blindly in the darkness of CPython internals and so I don't feel comfortable to write a PEP which would mean that I know what I am doing :-)
That's OK, you're not Dutch. :-) Or are you? :-) :-)
So far, I have no strong opinion about passing explicitly tstate vs "implicit" TLS (Thread Local Storage) variable. I mean, I don't feel that I have enough feedback to see which option is the best.
All I can say is that passing explicitly tstate helps me to validate that the code is *correct*. It helps me to understand where tstate comes from and helps me to understand if tstate is valid or not.
That makes a lot of sense, as a strategy for doing the work. It should be pretty straightforward to convert the tstate argument to a thread-local tstate. But ... this sounds to me like work that should be done on a branch. I'm sure you considered that, but I also expect others will feel the same way. Perhaps this is a good opportunity to document why it's not being done on a branch.
There are still too many functions which don't work properly with subinterpreters. There are still a few design issues that needs to be addressed.
Are subinterpreters in use in production? If so, and especially if these use cases are unpatched vs. master, that would be justification for pushing to master since you are clearly fixing problems with subinterpreter use. If not, I still tend to think this work should be on a branch, at least until the tstate argument vs. thread-local tstate question is resolved. I realize that these are all abstract arguments in favor of work on a branch. I think it would be useful if you could provide concrete reasons why work on master is a better way to proceed. I wouldn't be surprised if some of the examples of bugs and improvements you've already given would be involved, and I just don't understand the implications without explanation. Regards, Steve
On Sat, 21 Mar 2020 at 13:15, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
That makes a lot of sense, as a strategy for doing the work. It should be pretty straightforward to convert the tstate argument to a thread-local tstate.
Note that the thread locals are already there, as that's how the public API already works. The problem is what to do when that tstate *hasn't* been populated yet. PEP 311's EnsureGIL APIs try to help with that problem, but they assume there's only one interpreter, and implicitly create a new tstate if one doesn't already exist. The APIs that now accept an explicit tstate don't care, as you can acquire that tstate from anywhere, it doesn't need to already be active on the current thread. The main working PEP at the moment is PEP 554, but the overall project tying everything together is written up at https://github.com/ericsnowcurrently/multi-core-python
But ... this sounds to me like work that should be done on a branch.
The initial PEP 432 work to clean up interpreter initialisation did live on a branch for a couple of years. The problem was that we kept finding *bugs* and other behaviour problems that needed those architectural changes to properly resolve (like Victor's PEP 540 "UTF-8 mode"), as well an inability to properly *test* that we hadn't broken anything without exposing the revised architecture to the full complexity of CPython's real world use cases. Thus the original switch to the "in-tree private API" approach as described in https://www.python.org/dev/peps/pep-0432/#implementation-strategy which ultimately lead to the public initialisation API changes in PEP 587 (and the original 3.7.0 release *did* identify previously untested embedding use cases that broke in 3.7.0 and were fixed in 3.7.1). For PEP 554, the proposed public "interpreters" module is out of tree, but the low level changes to fix assorted bugs in interpreter finalization and the existing subinterpreter support is happening in-tree. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Le sam. 21 mars 2020 à 04:14, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> a écrit :
But ... this sounds to me like work that should be done on a branch.
I'm sure you considered that, but I also expect others will feel the same way. Perhaps this is a good opportunity to document why it's not being done on a branch.
For almost every small changes that I wanted to write, I had to push many small "refactoring" changes to "prepare" the code base for that. I mean that while I'm doing the changes that I want to do, I had to fix dozens of bugs which didn't impact anyone else before because subinterpreters were not properly isolated. For example, to be able to make small integer singletons per-interpreters, I had to fix the code to initialize builtins and sys modules to make them really isolated from the main interpreter. One of the practical issue that I had is that when a subinterpreter leaked references, previously it was ignored since we basically ignore references which are never released at exit. We only care of differences before/after running an unit test when using "python3 -m test -R 3:3 (...)". But when I *fixed* the implementation of subinterpreters, suddenly, very old bugs became visible. What I'm trying to say is that "fixing" the implementation of subinterpreters require a deep refactoing of Python internals which cannot be done in a few small changes. Maintaining a branch in a fork of the Git project is likely to become too hard to maintain (rebase the branch). Moreover, I like to push small commits which are easy to understand, easy to review and easy to revert if needed. If we build a giant branch, what would be the outcome? Merge 1 year of work as a single commit? Currently, all commits of a pull request are almost squashed into a single large commit.
There are still too many functions which don't work properly with subinterpreters. There are still a few design issues that needs to be addressed.
Are subinterpreters in use in production?
Yes. it's used for many years in production by Apache mod_wsgi project for example.
If so, and especially if these use cases are unpatched vs. master, that would be justification for pushing to master since you are clearly fixing problems with subinterpreter use.
Yes, I'm fixing real bugs ;-) Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Fri, Mar 20, 2020 at 11:27 AM Victor Stinner <vstinner@python.org> wrote:
I would prefer to continue to experiment passing tstate explicitly in internal C APIs until most blocker issues will be fixed. Once early work on running two subinterpreters in parallel will start working (one "GIL" per interpreter), I will be more open to reconsider using a TLS variable.
The PEP for parallel subinterpreters hasn't been accepted yet either, right?
"Inefficient signal handling in multithreaded applications" https://bugs.python.org/issue40010
CPython's current signal handling architecture basically assumes that signals are always delivered to the main thread. (Fortunately, on real systems, this is almost always true.) In particular, it assumes that if a syscall arrives while the main thread is blocked in a long-running syscall, then the syscall will be interrupted, which is only true when the signal is delivered to the main thread. AFAICT if we really care about off-main-thread signals, then the only way to handle them properly is for the signal handler to detect when they happen, and redeliver the signal to the main thread using pthread_kill, and then let the main thread set its own eval_breaker etc. -n -- Nathaniel J. Smith -- https://vorpus.org
On Fri, 20 Mar 2020 19:24:22 +0100 Victor Stinner <vstinner@python.org> wrote:
One good example is Py_AddPendingCall(). The documentation says that it's safe to call it without holding the GIL. Except that right now, there is no reliable way to get the correct interpreter in this case (correct me if I'm wrong!).
Define what "the correct interpreter" is? The only way to solve this conundrum IMHO is to add a `Py_AddPendingCallEx(PyInterpreterState*)`.
The function uses PyGILState_GetThisThreadState() which may return a Python thread state of the wrong interpreter :-( Again, the PyGILState API should be fixed to support subinterpreters.
Similarly, the only possible fix is to add a per-interpreter GILState API (with a per-interpreter TLS variable under the hood). The caller knows which interpreter context it wants to run Python code in, so just let it pass that information to the GILState API. The most annoying part is what to do with the legacy GILState API. Regards Antoine.
participants (7)
-
Anders Munch
-
Antoine Pitrou
-
Kyle Stanley
-
Nathaniel Smith
-
Nick Coghlan
-
Stephen J. Turnbull
-
Victor Stinner