Future-proof GIL release and regrab for error/warning support

Hi all,
I am a bit confused about future proof GIL release with respect to potential subinterpreter development. Or maybe just the best practices.
In short, is the PyGILState_Ensure()
function safe? I had the
impression that was definitely not, but now makes me wondered how even
PyErr_Occurred()
can be safe when
EXPERIMENTAL_ISOLATED_SUBINTERPRETERS
. Of course PyErr_Occurred()
is called with the GIL, so that may make a big difference.
I scanned through this thread: https://bugs.python.org/issue15751 which
suggested the addition of API to make it (almost?) always safe? But
most of that thread is very old, much older than the newer
EXPERIMENTAL_ISOLATED_SUBINTERPRETERS
work.
Reading https://bugs.python.org/issue4202 makes me think there is no
current solution/safety?
In my case (NumPy), I may be able to assume that the
PyGILState_Ensure()
calls I am talking about will always happen in
the same thread as the initial GIL release, which gives me some hope
that it might actually just work? [1]
My impression until now was that the only clean solution would be to pass the threadstate. But, there is some public API, where we would require new API to pass the threadstate (which means slow adoption at the very least). So if I can avoid it, that would be much easier.
The consumer (the person who needs to grab the GIL) can be different
from the user (the person releasing the GIL). So I am considering
passing PyThreadState **
with NULL indicating "unknown" state to the
consumer.
Much code may not even release the GIL, and in either case a move to a
new API has to take at least 1-2 years, so I can't really wait for that
to be done.
Cheers,
Sebastian
[1] To be honest, I am not quite sure on this. It is true for NumPy itself, but I am not sure it is true for potential consumers of NumPy API. Or maybe, it would just be something to document very clearly somewhere.

Hi all,
I am still confused about this. Maybe shorter, we have:
User who release GIL → NumPy API → third-party C function
We can probably assume that the user will use NumPy API
from random
threads, but only from threads already running Python (i.e. created by
Python), although there may be arguments that exceptions would be nice,
I do not think those are a priority.
In that case, is PyGILState_Ensure()
considered safe and appropriate
API to be used within "third-party C function" (and/or NumPy API)? I
used to generally think its not, but now it seems to me it should be,
if the thread we are running on was previously spawned by Python?
Or should it be a priority that e.g. the PyThreadState *tstate
is in
some form passed all the way down to the "third-part C function" so
that warnings or errors could be set more easily?
We could always try to design an API where the error setting is deferred until a time when the GIL is guaranteed to be held, but that seems inconvenient to me (it requires to pass out all error-information in some different way, only to convert it to an error later).
Cheers,
Sebastian
On Fri, 2020-08-07 at 12:04 -0500, Sebastian Berg wrote:

On Mon, 2020-08-10 at 18:57 +0200, Antoine Pitrou wrote:
Yes, sorry, to be clear: I am worried about subinterpreters. Not because I feel its pressing, or because I particularly like the topic... But if I create new public API now, I don't want to have to extend it in 2 years for something I can anticipate now. At least in cases where adding the API does not hurt.
In one part of NumPy it definitely will hurt a bit, but it would still help to anticipate that this may be necessary in a few years.
And quite frankly, it would just be nice to have a clear idea one way or another. Even if that is: Prefer avoiding it, but expect that it may never be necessary. Or maybe it is: It doesn't work right now for subinterpreters, but that can be fixed on the Python side for the relevant cases.
Cheers,
Sebastian

Le 10/08/2020 à 19:04, Sebastian Berg a écrit :
In this case, for future-proofing, you may want to take a
PyInterpreterState*
argument, because that is probably going to be how
CPython approaches things (i.e. create a new API like
PyGILState_Ensure(), but that knows about different interpreters).
I would also recommend you chime in on https://bugs.python.org/issue15751 and describe your dilemma there.
(cc Eric Snow, who works on improving subinterpreters, and whom I'm not sure is on this list)
Regards
Antoine.

On Mon, 2020-08-10 at 19:41 +0200, Antoine Pitrou wrote:
Thanks, the main thing right now is be about passing said argument, although taking may also happen (I need to take it at one place, but that requires a complete new function in any case).
So that sounds like I should simply pass an always NULL, reserved argument for now (for the sake of passing, this will probably part of a struct in this case). And see where things go in the future.
For taking, I guess I will just defer any decision about it, if its very clear where things are going. That may require downstream to jump through small hoops later on, but maybe that is just life...
- Sebastian

Hi,
I'm not sure if it's going to help you, but here are my notes about the "Python Thread State": https://pythondev.readthedocs.io/pystate.html
Victor
Le ven. 7 août 2020 à 19:05, Sebastian Berg <sebastian@sipsolutions.net> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.

On Tue, 2020-08-11 at 16:05 +0200, Victor Stinner wrote:
Thanks, can't say I can make much of it after looking at it for a while. It currently seems to me that using PyGILState_Ensure is likely fine on the same thread which released the GIL (or can be made to work in Python). But it also seems to that passing in the threadstate is probably just cleaner/nicer when possible.
Maybe I need to give some pseudo code. The 3rd party function is a ufunc loop, slightly simplified (anyone can register it):
cdef int ufunc_loop(char **data, ssize_t n, ...): for i in range(n): # Do things with data if something_bad: with gil: PyErr_SetString(...) return -1 return 0
Now, I need to decide the signature for that function (it has to be updated), there is a bunch of potentially useful metadata, I was considering the pointer to a struct, at least for most of it:
typedef struct metadata { PyArray_Descr *dtypes, PyThreadState *tstate, void *user_static_data, void *user_data, /* ... */ }
which can include the tstate
(whether or not its a struct hardly
matters here).
Unfortunately, there is at least one place where I cannot guarantee I know the threadstate. Thus we may need to pass NULL
to indicate
an unknown state.
The GIL may or may not be released, but I guess I could pass the
threadstate even when the GIL is held. I was toying with the thought
of passing PyThreadState **
(mainly because it would make updating it
easier).
Antoine mentioned passing PyInterpreterState *
, so I am unsure if
there would be a reason to prefer that?
If there is no clear right thing, but passing something seems
preferred, I can also just pass: void *reserved = NULL
for now to
simplify eventual addition.
Cheers,
Sebastian

Le 12/08/2020 à 10:40, Sebastian Berg a écrit :
A threadstate is supposed to be associated to a particuler OS thread. For example, the threadstate is destroyed (and its pointer deallocated) when the OS thread exits.
So, no, passing the threadstate is not how you should do it.
That sounds reasonable.
Regards
Antoine.

Hi all,
I am still confused about this. Maybe shorter, we have:
User who release GIL → NumPy API → third-party C function
We can probably assume that the user will use NumPy API
from random
threads, but only from threads already running Python (i.e. created by
Python), although there may be arguments that exceptions would be nice,
I do not think those are a priority.
In that case, is PyGILState_Ensure()
considered safe and appropriate
API to be used within "third-party C function" (and/or NumPy API)? I
used to generally think its not, but now it seems to me it should be,
if the thread we are running on was previously spawned by Python?
Or should it be a priority that e.g. the PyThreadState *tstate
is in
some form passed all the way down to the "third-part C function" so
that warnings or errors could be set more easily?
We could always try to design an API where the error setting is deferred until a time when the GIL is guaranteed to be held, but that seems inconvenient to me (it requires to pass out all error-information in some different way, only to convert it to an error later).
Cheers,
Sebastian
On Fri, 2020-08-07 at 12:04 -0500, Sebastian Berg wrote:

On Mon, 2020-08-10 at 18:57 +0200, Antoine Pitrou wrote:
Yes, sorry, to be clear: I am worried about subinterpreters. Not because I feel its pressing, or because I particularly like the topic... But if I create new public API now, I don't want to have to extend it in 2 years for something I can anticipate now. At least in cases where adding the API does not hurt.
In one part of NumPy it definitely will hurt a bit, but it would still help to anticipate that this may be necessary in a few years.
And quite frankly, it would just be nice to have a clear idea one way or another. Even if that is: Prefer avoiding it, but expect that it may never be necessary. Or maybe it is: It doesn't work right now for subinterpreters, but that can be fixed on the Python side for the relevant cases.
Cheers,
Sebastian

Le 10/08/2020 à 19:04, Sebastian Berg a écrit :
In this case, for future-proofing, you may want to take a
PyInterpreterState*
argument, because that is probably going to be how
CPython approaches things (i.e. create a new API like
PyGILState_Ensure(), but that knows about different interpreters).
I would also recommend you chime in on https://bugs.python.org/issue15751 and describe your dilemma there.
(cc Eric Snow, who works on improving subinterpreters, and whom I'm not sure is on this list)
Regards
Antoine.

On Mon, 2020-08-10 at 19:41 +0200, Antoine Pitrou wrote:
Thanks, the main thing right now is be about passing said argument, although taking may also happen (I need to take it at one place, but that requires a complete new function in any case).
So that sounds like I should simply pass an always NULL, reserved argument for now (for the sake of passing, this will probably part of a struct in this case). And see where things go in the future.
For taking, I guess I will just defer any decision about it, if its very clear where things are going. That may require downstream to jump through small hoops later on, but maybe that is just life...
- Sebastian

Hi,
I'm not sure if it's going to help you, but here are my notes about the "Python Thread State": https://pythondev.readthedocs.io/pystate.html
Victor
Le ven. 7 août 2020 à 19:05, Sebastian Berg <sebastian@sipsolutions.net> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.

On Tue, 2020-08-11 at 16:05 +0200, Victor Stinner wrote:
Thanks, can't say I can make much of it after looking at it for a while. It currently seems to me that using PyGILState_Ensure is likely fine on the same thread which released the GIL (or can be made to work in Python). But it also seems to that passing in the threadstate is probably just cleaner/nicer when possible.
Maybe I need to give some pseudo code. The 3rd party function is a ufunc loop, slightly simplified (anyone can register it):
cdef int ufunc_loop(char **data, ssize_t n, ...): for i in range(n): # Do things with data if something_bad: with gil: PyErr_SetString(...) return -1 return 0
Now, I need to decide the signature for that function (it has to be updated), there is a bunch of potentially useful metadata, I was considering the pointer to a struct, at least for most of it:
typedef struct metadata { PyArray_Descr *dtypes, PyThreadState *tstate, void *user_static_data, void *user_data, /* ... */ }
which can include the tstate
(whether or not its a struct hardly
matters here).
Unfortunately, there is at least one place where I cannot guarantee I know the threadstate. Thus we may need to pass NULL
to indicate
an unknown state.
The GIL may or may not be released, but I guess I could pass the
threadstate even when the GIL is held. I was toying with the thought
of passing PyThreadState **
(mainly because it would make updating it
easier).
Antoine mentioned passing PyInterpreterState *
, so I am unsure if
there would be a reason to prefer that?
If there is no clear right thing, but passing something seems
preferred, I can also just pass: void *reserved = NULL
for now to
simplify eventual addition.
Cheers,
Sebastian

Le 12/08/2020 à 10:40, Sebastian Berg a écrit :
A threadstate is supposed to be associated to a particuler OS thread. For example, the threadstate is destroyed (and its pointer deallocated) when the OS thread exits.
So, no, passing the threadstate is not how you should do it.
That sounds reasonable.
Regards
Antoine.
participants (3)
-
Antoine Pitrou
-
Sebastian Berg
-
Victor Stinner