Support for Multiple Interpreters (Subinterpreters) in numpy

Hi all,
CPython has supported multiple interpreters (in the same process) for a long time, but only through the C-API. I'm working on exposing that functionality to Python code (see PEP 554), aiming for 3.12. I expect that users will find the feature useful (particularly with a per-interpreter GIL--see PEP 684) and that it will be used a lot more over the coming years. This has the potential to impact extension module projects, especially large ones like numpy, which is why I'm reaching out to you.
Use of multiple interpreters depends on isolation between them. When an extension module is imported in multiple interpreters, it is loaded separately into a new module object in each. Extensions often store module data/state in C globals, which means the the multiple instances end up sharing data. This causes problems, more so once we have one GIL per interpreter.
Over the years we have added machinery to help extensions get the necessary isolation, moving away from global variables. This includes PEPs 384, 3121, and 489. This has culminated in the guide you can find in PEP 630.
Note that nothing should change when only a single interpreter is in use (basically the status quo). With PEP 684, importing an incompatible extension outside the main (initial) interpreter will now be an ImportError. (Currently the behavior is undefined and too often results in hard-to-debug failures and crashes.)
Thus extension module maintainers do have the option to *not* support multiple interpreters. Unfortunately, that doesn't mean their users won't pester them about adding support. We all recognize how that dynamic can be draining on a project. The potential burden on maintainers is a serious factor for these upcoming changes. numpy is likely to be affected more than any other project. That's why I'm starting this thread.
PEP 684 discusses all of the above. What I'm after with this thread is:
* to make sure the numpy maintainers are clear on what interpreter isolation requires of the project * a clear picture of what changes numpy would need (and how much work that would be) * feedback on what the CPython team can do to minimize that work (incl. adding new C APIs)
I'm fine with having the discussion here, but I will probably create a new category on discuss.python.org for a variety of similar threads related to multiple interpreters and supporting them. Having our discussion there may lead to more participation from more CPython core devs than just me. Do you have any preference for or against any particular venue?
Thanks!
-eric

On 22/8/22 18:59, Eric Snow wrote:
Hi all,
CPython has supported multiple interpreters (in the same process) for a long time, but only through the C-API. I'm working on exposing that functionality to Python code (see PEP 554), aiming for 3.12. I expect that users will find the feature useful (particularly with a per-interpreter GIL--see PEP 684) and that it will be used a lot more over the coming years. This has the potential to impact extension module projects, especially large ones like numpy, which is why I'm reaching out to you.
Use of multiple interpreters depends on isolation between them. When an extension module is imported in multiple interpreters, it is loaded separately into a new module object in each. Extensions often store module data/state in C globals, which means the the multiple instances end up sharing data. This causes problems, more so once we have one GIL per interpreter.
Over the years we have added machinery to help extensions get the necessary isolation, moving away from global variables. This includes PEPs 384, 3121, and 489. This has culminated in the guide you can find in PEP 630.
Note that nothing should change when only a single interpreter is in use (basically the status quo). With PEP 684, importing an incompatible extension outside the main (initial) interpreter will now be an ImportError. (Currently the behavior is undefined and too often results in hard-to-debug failures and crashes.)
Thus extension module maintainers do have the option to *not* support multiple interpreters. Unfortunately, that doesn't mean their users won't pester them about adding support. We all recognize how that dynamic can be draining on a project. The potential burden on maintainers is a serious factor for these upcoming changes. numpy is likely to be affected more than any other project. That's why I'm starting this thread.
PEP 684 discusses all of the above. What I'm after with this thread is:
- to make sure the numpy maintainers are clear on what interpreter
isolation requires of the project
- a clear picture of what changes numpy would need (and how much work
that would be)
- feedback on what the CPython team can do to minimize that work
(incl. adding new C APIs)
I'm fine with having the discussion here, but I will probably create a new category on discuss.python.org for a variety of similar threads related to multiple interpreters and supporting them. Having our discussion there may lead to more participation from more CPython core devs than just me. Do you have any preference for or against any particular venue?
Thanks!
-eric _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: matti.picus@gmail.com
Thanks for starting the conversation. I would personally prefer the discussion about NumPy be here, general discussions could be elsewhere.
Please correct me if I am wrong: I understand that multiple interpreters would require us to (at least):
- refactor all the static module global state in NumPy and make it re-entrant or immortal including converting stack-allocated PyTypeObjects to heap types.
- find a mechanism to access the per-interpreter module state
- carefully consider places in the code that we steal references either intentionally or because that is the CPython C-API we are using
- measure the performance implications of the necessary changes
- plan forward/backward compatibility
This seems like a significant undertaking, and is why we have rejected casual calls for supporting multiple interpreters in the past [2], [3], [4]. Supporting multiple interpreters is currently not on the NumPy roadmap [0]. Priorities can be changed, through dialog with the NumPy community, and others can propose changes to NumPy via NEPs, PRs, and issues, but we are unlikely to engage directly in the work if it is not an agreed upon goal. There are other initiatives around NumPy that may dovetail with multiple interpreters. For instance the HPy group hit many of the issues above when creating a port of NumPy [5]. It would be good to get like-minded people talking about this and to pool resources, maybe someone on this list has a strong opinion and would be willing to put in some work on the subject.
One thing CPython could do is to provide clear documentation how to port a small c-extension module [1]
Matti
[0] https://numpy.org/neps/roadmap.html
[1] https://github.com/python/cpython/issues/79601
[2] https://github.com/numpy/numpy/issues/665
[3] https://github.com/numpy/numpy/issues/14384
[4] https://github.com/numpy/numpy/issues/16963
[5] https://github.com/hpyproject/numpy-hpy/tree/graal-team/hpy#readme

On 23/8/22 03:16, Matti Picus wrote:
...
One thing CPython could do is to provide clear documentation how to port a small c-extension module [1]
I should have searched the documentation, there is now a quite extensive guide [2] including all the different interfaces provided for getting per-interpreter module state.
Matti
[2] https://docs.python.org/3.11/howto/isolating-extensions.html
[3] https://docs.python.org/3.11/c-api/type.html#c.PyType_GetModuleState

On 23. 08. 22 10:02, Matti Picus wrote:
On 23/8/22 03:16, Matti Picus wrote:
...
One thing CPython could do is to provide clear documentation how to port a small c-extension module [1]
I should have searched the documentation, there is now a quite extensive guide [2] including all the different interfaces provided for getting per-interpreter module state.
Nothing to apologize about, it is only in the docs for the unreleased 3.11 :) I'd be happy to answer questions and clarify things. Please let me know if the written text lets you down.
Matti
[2] https://docs.python.org/3.11/howto/isolating-extensions.html
[3] https://docs.python.org/3.11/c-api/type.html#c.PyType_GetModuleState
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: encukou@gmail.com

On Tue, 2022-08-23 at 03:16 +0300, Matti Picus wrote:
On 22/8/22 18:59, Eric Snow wrote:
Hi all,
<snip>
devs than just me. Do you have any preference for or against any particular venue?
Thanks!
-eric _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: matti.picus@gmail.com
Thanks for starting the conversation. I would personally prefer the discussion about NumPy be here, general discussions could be elsewhere.
Please correct me if I am wrong: I understand that multiple interpreters would require us to (at least):
These days, I was somewhat hoping that the HPy effort might give us subinterpreters without having two seperate efforts going on at the same time. Since much of the refactors are probably identical between the two and it seemed some significant effort might go into that.
But of course starting with subinterpreter support without HPy probably also helps the HPy effort.
- refactor all the static module global state in NumPy and make it
re-entrant or immortal including converting stack-allocated PyTypeObjects to heap types.
What is the status of immortality? None of these seem forbidding on first sight, so long that we can get the state everywhere. Having immortal object seems convenient, but probably not particularly necessary.
Most of our state is currently in static variables in functions (usually filled in dynamically at first call). That is very convenient since it doesn't require a global list anywhere.
I suppose moving it to module-state may well require a global list (or is there a nice other pattern?). But while tedious, it doesn't seem problematic.
Switching to heap types should not be a big deal I suspect.
- find a mechanism to access the per-interpreter module state
One thing that I am not clear about are e.g. creation functions. They are public C-API so they have no way of getting a "self" or type/module passed in. How will such a function get the module state?
Now, we could likely replace those functions in the long run (or even just remove many). But it seems to me that we may need a `PyType_GetModuleByDef()` that is passed _only_ the `module_def`?
- carefully consider places in the code that we steal references
either intentionally or because that is the CPython C-API we are using
This is an issue for HPy that needs to be cleared up, although I am wondering how important it is for subinterpreters as such?
measure the performance implications of the necessary changes
plan forward/backward compatibility
One other thing I am not quite sure about right now is GIL grabbing. `PyGILState_Ensure()` will continue to work reliably? This used to be one of my main worries. It is also something we can fix-up (pass through additional information), but where a fallback seems needed.
Cheers,
Sebastian
This seems like a significant undertaking, and is why we have rejected casual calls for supporting multiple interpreters in the past [2], [3], [4]. Supporting multiple interpreters is currently not on the NumPy roadmap [0]. Priorities can be changed, through dialog with the NumPy community, and others can propose changes to NumPy via NEPs, PRs, and issues, but we are unlikely to engage directly in the work if it is not an agreed upon goal. There are other initiatives around NumPy that may dovetail with multiple interpreters. For instance the HPy group hit many of the issues above when creating a port of NumPy [5]. It would be good to get like-minded people talking about this and to pool resources, maybe someone on this list has a strong opinion and would be willing to put in some work on the subject.
One thing CPython could do is to provide clear documentation how to port a small c-extension module [1]
Matti
[0] https://numpy.org/neps/roadmap.html
[1] https://github.com/python/cpython/issues/79601
[2] https://github.com/numpy/numpy/issues/665
[3] https://github.com/numpy/numpy/issues/14384
[4] https://github.com/numpy/numpy/issues/16963
[5] https://github.com/hpyproject/numpy-hpy/tree/graal-team/hpy#readme
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net

On 23. 08. 22 11:46, Sebastian Berg wrote:
On Tue, 2022-08-23 at 03:16 +0300, Matti Picus wrote:
On 22/8/22 18:59, Eric Snow wrote:
Hi all,
<snip>
devs than just me. Do you have any preference for or against any particular venue?
Thanks!
-eric _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: matti.picus@gmail.com
Thanks for starting the conversation. I would personally prefer the discussion about NumPy be here, general discussions could be elsewhere.
Please correct me if I am wrong: I understand that multiple interpreters would require us to (at least):
These days, I was somewhat hoping that the HPy effort might give us subinterpreters without having two seperate efforts going on at the same time. Since much of the refactors are probably identical between the two and it seemed some significant effort might go into that.
But of course starting with subinterpreter support without HPy probably also helps the HPy effort.
Both should help each other.
- refactor all the static module global state in NumPy and make it
re-entrant or immortal including converting stack-allocated PyTypeObjects to heap types.
What is the status of immortality? None of these seem forbidding on first sight, so long that we can get the state everywhere. Having immortal object seems convenient, but probably not particularly necessary.
Most of our state is currently in static variables in functions (usually filled in dynamically at first call). That is very convenient since it doesn't require a global list anywhere.
I suppose moving it to module-state may well require a global list (or is there a nice other pattern?). But while tedious, it doesn't seem problematic.
A struct for the module state is the state of the art, yes.
Switching to heap types should not be a big deal I suspect.
- find a mechanism to access the per-interpreter module state
One thing that I am not clear about are e.g. creation functions. They are public C-API so they have no way of getting a "self" or type/module passed in. How will such a function get the module state?
Now, we could likely replace those functions in the long run (or even just remove many). But it seems to me that we may need a `PyType_GetModuleByDef()` that is passed _only_ the `module_def`?
Then you're looking at per-interpreter state, or thread-locals. That's problematic, e.g. you now need to handle clean-up at interpreter shutdown, and the that isn't well supported. (Or leak -- AFAIK that's what NumPy currently does when Python's single interpreter is finalized?) I do urge you to assume that there can be multiple isolated NumPy modules created from a single def, even in a single interpreter. It's an additional constraint, but since it's conceptually simple I do think it makes up for itself in regularity/maintainability/reviewability.
And if the CPython API is lacking, it would be best to solve that in CPython.
- carefully consider places in the code that we steal references
either intentionally or because that is the CPython C-API we are using
This is an issue for HPy that needs to be cleared up, although I am wondering how important it is for subinterpreters as such?
Not important. Borrowed references work mainly to enable optimized collections that don't store full PyObjects -- currently that's HPy territory. If you find the C API forcing you to steal references, I do want to eventually fix that in CPython to make switching to HPy easy (and eventually to enable the optimizations in CPython). A lot of “better” alternative APIs was actually added in recent versions, and I'd welcome requests for what to prioritize for Python 3.12+.
measure the performance implications of the necessary changes
plan forward/backward compatibility
One other thing I am not quite sure about right now is GIL grabbing. `PyGILState_Ensure()` will continue to work reliably? This used to be one of my main worries. It is also something we can fix-up (pass through additional information), but where a fallback seems needed.
Per-interpreter GIL is an *additional* step. I believe it will need its own opt-in mechanism. But subinterpreter support is a prerequisite for it. So yes, PyGILState_Ensure will still acquire a global lock for you.
Cheers,
Sebastian
This seems like a significant undertaking, and is why we have rejected casual calls for supporting multiple interpreters in the past [2], [3], [4]. Supporting multiple interpreters is currently not on the NumPy roadmap [0]. Priorities can be changed, through dialog with the NumPy community, and others can propose changes to NumPy via NEPs, PRs, and issues, but we are unlikely to engage directly in the work if it is not an agreed upon goal. There are other initiatives around NumPy that may dovetail with multiple interpreters. For instance the HPy group hit many of the issues above when creating a port of NumPy [5]. It would be good to get like-minded people talking about this and to pool resources, maybe someone on this list has a strong opinion and would be willing to put in some work on the subject.
One thing CPython could do is to provide clear documentation how to port a small c-extension module [1]
Matti
[0] https://numpy.org/neps/roadmap.html
[1] https://github.com/python/cpython/issues/79601
[2] https://github.com/numpy/numpy/issues/665
[3] https://github.com/numpy/numpy/issues/14384
[4] https://github.com/numpy/numpy/issues/16963
[5] https://github.com/hpyproject/numpy-hpy/tree/graal-team/hpy#readme
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: encukou@gmail.com

On Tue, 2022-08-23 at 14:00 +0200, Petr Viktorin wrote:
On 23. 08. 22 11:46, Sebastian Berg wrote:
On Tue, 2022-08-23 at 03:16 +0300, Matti Picus wrote:
On 22/8/22 18:59, Eric Snow wrote:
Hi all,
<snip>
devs than just me. Do you have any preference for or against any particular venue?
Thanks!
-eric _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: matti.picus@gmail.com
Thanks for starting the conversation. I would personally prefer the discussion about NumPy be here, general discussions could be elsewhere.
Please correct me if I am wrong: I understand that multiple interpreters would require us to (at least):
These days, I was somewhat hoping that the HPy effort might give us subinterpreters without having two seperate efforts going on at the same time. Since much of the refactors are probably identical between the two and it seemed some significant effort might go into that.
But of course starting with subinterpreter support without HPy probably also helps the HPy effort.
Both should help each other.
- refactor all the static module global state in NumPy and make
it re-entrant or immortal including converting stack-allocated PyTypeObjects to heap types.
What is the status of immortality? None of these seem forbidding on first sight, so long that we can get the state everywhere. Having immortal object seems convenient, but probably not particularly necessary.
Most of our state is currently in static variables in functions (usually filled in dynamically at first call). That is very convenient since it doesn't require a global list anywhere.
I suppose moving it to module-state may well require a global list (or is there a nice other pattern?). But while tedious, it doesn't seem problematic.
A struct for the module state is the state of the art, yes.
Switching to heap types should not be a big deal I suspect.
- find a mechanism to access the per-interpreter module state
One thing that I am not clear about are e.g. creation functions. They are public C-API so they have no way of getting a "self" or type/module passed in. How will such a function get the module state?
Now, we could likely replace those functions in the long run (or even just remove many). But it seems to me that we may need a `PyType_GetModuleByDef()` that is passed _only_ the `module_def`?
Then you're looking at per-interpreter state, or thread-locals. That's problematic, e.g. you now need to handle clean-up at interpreter shutdown, and the that isn't well supported. (Or leak -- AFAIK that's what NumPy currently does when Python's single interpreter is finalized?) I do urge you to assume that there can be multiple isolated NumPy modules created from a single def, even in a single interpreter. It's an additional constraint, but since it's conceptually simple I do think it makes up for itself in regularity/maintainability/reviewability.
And if the CPython API is lacking, it would be best to solve that in CPython.
The issue is that we have public C-API that will be lacking the necessary information. Maybe pretty deep API (I am not certain).
Now that I think about it, even things like the type is unclear to me. `&PyArray_Type` would not be per interpreter (unless we figure out immortality). But it exists as public API just like `Py_None`, etc.?
Our public C-API is currently exported as a single static struct into the library loading NumPy. If types depend on the interpreter, it would seem we need to redo the whole mechanism? Further, many of the functions would need to be adapted. We might be able to hack that the API looks the same [1]. However, it cannot be ABI compatible, so we would need a whole new API table/export mechnism and some sort of shim to allow compiling against older NumPy versions but using it with all versions (otherwise we need 2+ years of patience).
Of course there might be a point in saying that most C-API use is initially not subinterpreter ready, but it does seem like a pretty huge limitation...
Cheers,
Sebastian
[1] I.e. smuggle in module state without the library importing the NumPy C-API having to change its code.
- carefully consider places in the code that we steal references
either intentionally or because that is the CPython C-API we are using
This is an issue for HPy that needs to be cleared up, although I am wondering how important it is for subinterpreters as such?
Not important. Borrowed references work mainly to enable optimized collections that don't store full PyObjects -- currently that's HPy territory. If you find the C API forcing you to steal references, I do want to eventually fix that in CPython to make switching to HPy easy (and eventually to enable the optimizations in CPython). A lot of “better” alternative APIs was actually added in recent versions, and I'd welcome requests for what to prioritize for Python 3.12+.
measure the performance implications of the necessary changes
plan forward/backward compatibility
One other thing I am not quite sure about right now is GIL grabbing. `PyGILState_Ensure()` will continue to work reliably? This used to be one of my main worries. It is also something we can fix-up (pass through additional information), but where a fallback seems needed.
Per-interpreter GIL is an *additional* step. I believe it will need its own opt-in mechanism. But subinterpreter support is a prerequisite for it. So yes, PyGILState_Ensure will still acquire a global lock for you.
Cheers,
Sebastian
This seems like a significant undertaking, and is why we have rejected casual calls for supporting multiple interpreters in the past [2], [3], [4]. Supporting multiple interpreters is currently not on the NumPy roadmap [0]. Priorities can be changed, through dialog with the NumPy community, and others can propose changes to NumPy via NEPs, PRs, and issues, but we are unlikely to engage directly in the work if it is not an agreed upon goal. There are other initiatives around NumPy that may dovetail with multiple interpreters. For instance the HPy group hit many of the issues above when creating a port of NumPy [5]. It would be good to get like-minded people talking about this and to pool resources, maybe someone on this list has a strong opinion and would be willing to put in some work on the subject.
One thing CPython could do is to provide clear documentation how to port a small c-extension module [1]
Matti
[0] https://numpy.org/neps/roadmap.html
[1] https://github.com/python/cpython/issues/79601
[2] https://github.com/numpy/numpy/issues/665
[3] https://github.com/numpy/numpy/issues/14384
[4] https://github.com/numpy/numpy/issues/16963
[5] https://github.com/hpyproject/numpy-hpy/tree/graal-team/hpy#readme
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: encukou@gmail.com
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net

On 23. 08. 22 16:19, Sebastian Berg wrote:
On Tue, 2022-08-23 at 14:00 +0200, Petr Viktorin wrote:
On 23. 08. 22 11:46, Sebastian Berg wrote:
[snip]
One thing that I am not clear about are e.g. creation functions. They are public C-API so they have no way of getting a "self" or type/module passed in. How will such a function get the module state?
Now, we could likely replace those functions in the long run (or even just remove many). But it seems to me that we may need a `PyType_GetModuleByDef()` that is passed _only_ the `module_def`?
Then you're looking at per-interpreter state, or thread-locals. That's problematic, e.g. you now need to handle clean-up at interpreter shutdown, and the that isn't well supported. (Or leak -- AFAIK that's what NumPy currently does when Python's single interpreter is finalized?) I do urge you to assume that there can be multiple isolated NumPy modules created from a single def, even in a single interpreter. It's an additional constraint, but since it's conceptually simple I do think it makes up for itself in regularity/maintainability/reviewability.
And if the CPython API is lacking, it would be best to solve that in CPython.
The issue is that we have public C-API that will be lacking the necessary information. Maybe pretty deep API (I am not certain).
Let's find out!
Now that I think about it, even things like the type is unclear to me. `&PyArray_Type` would not be per interpreter (unless we figure out immortality). But it exists as public API just like `Py_None`, etc.?
Exposing PyArray_Type that way means that it must be a static type. Those are immortal. (That said, static types are not compatible with Stable ABI -- which is related but not strictly necessary for subinterpreter support -- so if there's a chance to make it `my_numpy_api->PyArray_Type`, it would be better.)
Our public C-API is currently exported as a single static struct into the library loading NumPy. If types depend on the interpreter, it would seem we need to redo the whole mechanism?
Right, sounds like it needs to be a dynamically allocated struct. In the interim, one instance of the struct is static: that's the one used for anything that doesn't support multiple interpreters yet, and also as the module state in one “main” module object. (That would be the first module to be loaded, and until everything switches over, it'd get an unpaired incref to become “immortal” and leak at exit.)
Further, many of the functions would need to be adapted. We might be able to hack that the API looks the same [1]. However, it cannot be ABI compatible, so we would need a whole new API table/export mechnism and some sort of shim to allow compiling against older NumPy versions but using it with all versions (otherwise we need 2+ years of patience).
Having one static “main” module state in the interim would also help here.
Of course there might be a point in saying that most C-API use is initially not subinterpreter ready, but it does seem like a pretty huge limitation...
A huge limitation, but it might be a good way to break up the work to make it more manageable :)
Cheers,
Sebastian
[1] I.e. smuggle in module state without the library importing the NumPy C-API having to change its code.

On Wed, Aug 24, 2022 at 4:42 AM Petr Viktorin encukou@gmail.com wrote:
On 23. 08. 22 16:19, Sebastian Berg wrote:
Our public C-API is currently exported as a single static struct into the library loading NumPy. If types depend on the interpreter, it would seem we need to redo the whole mechanism?
Right, sounds like it needs to be a dynamically allocated struct. In the interim, one instance of the struct is static: that's the one used for anything that doesn't support multiple interpreters yet, and also as the module state in one “main” module object. (That would be the first module to be loaded, and until everything switches over, it'd get an unpaired incref to become “immortal” and leak at exit.)
Further, many of the functions would need to be adapted. We might be able to hack that the API looks the same [1]. However, it cannot be ABI compatible, so we would need a whole new API table/export mechnism and some sort of shim to allow compiling against older NumPy versions but using it with all versions (otherwise we need 2+ years of patience).
Having one static “main” module state in the interim would also help here.
Of course there might be a point in saying that most C-API use is initially not subinterpreter ready, but it does seem like a pretty huge limitation...
A huge limitation, but it might be a good way to break up the work to make it more manageable :)
FWIW, in CPython there's a similar issue. We currently expose static pointers to all the builtin exceptions in the C-API. Even worse, we expose the object *values* for all the static types and the several singletons. On top of that, these are all exposed in the limited API (stable ABI).
As a result, moving to one each per interpreter is messy. PEP 684 talks about the possible solutions. The simplest for us is to make all those objects immortal. However, in some cases we also have to do an interpreter-specific lookup internally. I expect you would have to do similar where/when compatibility remains essential.
-eric

On Tue, Aug 23, 2022 at 6:01 AM Petr Viktorin encukou@gmail.com wrote:
And if the CPython API is lacking, it would be best to solve that in CPython.
+1
In some ways, new CPython APIs would be the most important artifacts of this discussion. We want to minimize the effort it takes to support multiple interpreters. So we definitely want to know what we could provide that would help.
Per-interpreter GIL is an *additional* step. I believe it will need its own opt-in mechanism. But subinterpreter support is a prerequisite for it.
Yeah, that is an evolving point of discussion in PEP 684.
-eric

On Tue, Aug 23, 2022 at 3:47 AM Sebastian Berg sebastian@sipsolutions.net wrote:
What is the status of immortality? None of these seem forbidding on first sight, so long that we can get the state everywhere. Having immortal object seems convenient, but probably not particularly necessary.
The current proposal for immortal objects (PEP 683) will be going to the steering council soon. However, it only applies to the CPython runtime (internally). We don't have plans right now for a public API to make an object immortal. (That would be a separate proposal.) If isolating the extension, a la PEP 630, isn't feasible in the short term, we would certainly be open to discussing alternatives (incl. immortal objects).
One other thing I am not quite sure about right now is GIL grabbing. `PyGILState_Ensure()` will continue to work reliably? This used to be one of my main worries. It is also something we can fix-up (pass through additional information), but where a fallback seems needed.
Compatibility of the GIL state API with subinterpreters has been a long-standing bug. [1] That will be fixed. Otherwise, PyGILState_Ensure() should work correctly.
-eric
participants (4)
-
Eric Snow
-
Matti Picus
-
Petr Viktorin
-
Sebastian Berg