Hi, coming back to PEP 489 [1], the multi-phase extension module initialization. We originally designed it as an "all or nothing" feature, but as it turns out, the "all" part is so difficult to achieve that most potential users end up with "nothing". So, my question is: could we split it up so that projects can get at least the main advantages: module spec and unicode module naming. PEP 489 is a great protocol in the sense that it allows extension modules to set themselves up in the same way that Python modules do: load, create module, execute module code. Without it, creating the module and executing its code are a single step that is outside of the control of CPython, which prevents the module from knowing its metadata and CPython from knowing up-front what the module will actually be. Now, the problem with PEP 489 is that it requires support for reloading and subinterpreters at the same time [2]. For this, extension modules must essentially be free of static global state, which comprises both the module code itself and any external native libraries that it uses. That is somewhere between difficult and impossible to achieve. PEP 573 [3] explains some of the reasons, and lists solutions for some of the issues, but cannot solve the general problem that some extension modules simply cannot get rid of their global state, and are therefore inherently incompatible with reloading and subinterpreters. I would like the requirement in [2] to be lifted in PEP 489, to make the main features of the PEP generally available to all extension modules. The question is then how to opt out of the subinterpreter support. The PEP explicitly does not allow backporting new init slot functions/feeatures: "Unknown slot IDs will cause the import to fail with SystemError." But at least changing this in Py3.8 should be doable and would be really nice. What do you think? Stefan [1] https://www.python.org/dev/peps/pep-0489/ [2] https://www.python.org/dev/peps/pep-0489/#subinterpreters-and-interpreter-re... [3] https://www.python.org/dev/peps/pep-0573/
On 08/10/18 11:21, Stefan Behnel wrote:
Hi,
coming back to PEP 489 [1], the multi-phase extension module initialization. We originally designed it as an "all or nothing" feature, but as it turns out, the "all" part is so difficult to achieve that most potential users end up with "nothing". So, my question is: could we split it up so that projects can get at least the main advantages: module spec and unicode module naming.
PEP 489 is a great protocol in the sense that it allows extension modules to set themselves up in the same way that Python modules do: load, create module, execute module code. Without it, creating the module and executing its code are a single step that is outside of the control of CPython, which prevents the module from knowing its metadata and CPython from knowing up-front what the module will actually be.
Now, the problem with PEP 489 is that it requires support for reloading and subinterpreters at the same time [2]. For this, extension modules must essentially be free of static global state, which comprises both the module code itself and any external native libraries that it uses. That is somewhere between difficult and impossible to achieve. PEP 573 [3] explains some of the reasons, and lists solutions for some of the issues, but cannot solve the general problem that some extension modules simply cannot get rid of their global state, and are therefore inherently incompatible with reloading and subinterpreters.
Are there any issues that aren't explained in PEP 573? I don't think Python modules should be *inherently* incompatible with subinterpreters. Static global state is perhaps unavoidable in some cases, but IMO it should be managed when it's exposed to Python. If there are issues not in the PEPs, I'd like to collect the concrete cases in some document.
I would like the requirement in [2] to be lifted in PEP 489, to make the main features of the PEP generally available to all extension modules.
The question is then how to opt out of the subinterpreter support. The PEP explicitly does not allow backporting new init slot functions/feeatures:
"Unknown slot IDs will cause the import to fail with SystemError."
But at least changing this in Py3.8 should be doable and would be really nice.
I don't think we can just silently skip unknown slots -- that would mean modules wouldn't be getting features they asked for. Do you have some more sophisticated model for slots in mind, or is this something to be designed?
What do you think?
Stefan
[1] https://www.python.org/dev/peps/pep-0489/ [2] https://www.python.org/dev/peps/pep-0489/#subinterpreters-and-interpreter-re... [3] https://www.python.org/dev/peps/pep-0573/
Petr Viktorin schrieb am 10.08.2018 um 11:51:
On 08/10/18 11:21, Stefan Behnel wrote:
coming back to PEP 489 [1], the multi-phase extension module initialization. We originally designed it as an "all or nothing" feature, but as it turns out, the "all" part is so difficult to achieve that most potential users end up with "nothing". So, my question is: could we split it up so that projects can get at least the main advantages: module spec and unicode module naming.
PEP 489 is a great protocol in the sense that it allows extension modules to set themselves up in the same way that Python modules do: load, create module, execute module code. Without it, creating the module and executing its code are a single step that is outside of the control of CPython, which prevents the module from knowing its metadata and CPython from knowing up-front what the module will actually be.
Now, the problem with PEP 489 is that it requires support for reloading and subinterpreters at the same time [2]. For this, extension modules must essentially be free of static global state, which comprises both the module code itself and any external native libraries that it uses. That is somewhere between difficult and impossible to achieve. PEP 573 [3] explains some of the reasons, and lists solutions for some of the issues, but cannot solve the general problem that some extension modules simply cannot get rid of their global state, and are therefore inherently incompatible with reloading and subinterpreters.
Are there any issues that aren't explained in PEP 573? I don't think Python modules should be *inherently* incompatible with subinterpreters. Static global state is perhaps unavoidable in some cases, but IMO it should be managed when it's exposed to Python. If there are issues not in the PEPs, I'd like to collect the concrete cases in some document.
There's always the case where an external native library simply isn't re-entrant and/or requires configuration to be global. I know, there's static linking and there are even ways to load an external shared library multiple times, but that's just adding to the difficulties. Let's just accept that some things are not easy enough to make for a good requirement.
I would like the requirement in [2] to be lifted in PEP 489, to make the main features of the PEP generally available to all extension modules.
The question is then how to opt out of the subinterpreter support. The PEP explicitly does not allow backporting new init slot functions/feeatures:
"Unknown slot IDs will cause the import to fail with SystemError."
But at least changing this in Py3.8 should be doable and would be really nice.
I don't think we can just silently skip unknown slots -- that would mean modules wouldn't be getting features they asked for. Do you have some more sophisticated model for slots in mind, or is this something to be designed?
Sorry for not being clear here. I was asking for changing the assumptions that PEP 489 makes about modules that claim to support the multi-step initialisation part of the PEP. Adding a new (flag?) slot was just one idea for opting out of multi-initialisation support. Stefan
On 08/10/18 12:21, Stefan Behnel wrote:
Petr Viktorin schrieb am 10.08.2018 um 11:51:
On 08/10/18 11:21, Stefan Behnel wrote:
coming back to PEP 489 [1], the multi-phase extension module initialization. We originally designed it as an "all or nothing" feature, but as it turns out, the "all" part is so difficult to achieve that most potential users end up with "nothing". So, my question is: could we split it up so that projects can get at least the main advantages: module spec and unicode module naming.
PEP 489 is a great protocol in the sense that it allows extension modules to set themselves up in the same way that Python modules do: load, create module, execute module code. Without it, creating the module and executing its code are a single step that is outside of the control of CPython, which prevents the module from knowing its metadata and CPython from knowing up-front what the module will actually be.
Now, the problem with PEP 489 is that it requires support for reloading and subinterpreters at the same time [2]. For this, extension modules must essentially be free of static global state, which comprises both the module code itself and any external native libraries that it uses. That is somewhere between difficult and impossible to achieve. PEP 573 [3] explains some of the reasons, and lists solutions for some of the issues, but cannot solve the general problem that some extension modules simply cannot get rid of their global state, and are therefore inherently incompatible with reloading and subinterpreters.
Are there any issues that aren't explained in PEP 573? I don't think Python modules should be *inherently* incompatible with subinterpreters. Static global state is perhaps unavoidable in some cases, but IMO it should be managed when it's exposed to Python. If there are issues not in the PEPs, I'd like to collect the concrete cases in some document.
There's always the case where an external native library simply isn't re-entrant and/or requires configuration to be global. I know, there's static linking and there are even ways to load an external shared library multiple times, but that's just adding to the difficulties. Let's just accept that some things are not easy enough to make for a good requirement.
For that case, I think the right thing to do is for the module to raise an extension when it's being initialized for the second time, or when the underlying library would be initialized for the second time. "Avoid static global state" is a good rule of thumb for supporting subinterpreters nicely, but other strategies are possible. If an underlying library just expects to be initialized once, and then work from several modules, the Python wrapper should ensure that (using global state, most likely). Other ways of handling things should be possible, depending on the underlying library.
I would like the requirement in [2] to be lifted in PEP 489, to make the main features of the PEP generally available to all extension modules.
The question is then how to opt out of the subinterpreter support. The PEP explicitly does not allow backporting new init slot functions/feeatures:
"Unknown slot IDs will cause the import to fail with SystemError."
But at least changing this in Py3.8 should be doable and would be really nice.
I don't think we can just silently skip unknown slots -- that would mean modules wouldn't be getting features they asked for. Do you have some more sophisticated model for slots in mind, or is this something to be designed?
Sorry for not being clear here. I was asking for changing the assumptions that PEP 489 makes about modules that claim to support the multi-step initialisation part of the PEP. Adding a new (flag?) slot was just one idea for opting out of multi-initialisation support.
Would this be better than a flag + raising an error on init? One big disadvantage of a big opt-out-of-everything button is that it doesn't encourage people to think about what the actual non-reentrant piece of code is.
Petr Viktorin schrieb am 10.08.2018 um 13:48:
Would this be better than a flag + raising an error on init?
Ok, I've implemented this in Cython for now, to finally move the PEP-489 support forward. The somewhat annoying drawback is that module reloading previously *seemed* to work, simply because it didn't actually do anything. Now, people will get an exception in cases that previously worked silently. An exception would probably have been better from the beginning, because it clearly tells people that what they are trying is not supported. Now it's a bit of a breaking change. I'll see what it gives. Thanks for your feedback on this. Stefan
On Sun, 12 Aug 2018 at 00:50, Stefan Behnel <stefan_ml@behnel.de> wrote:
Petr Viktorin schrieb am 10.08.2018 um 13:48:
Would this be better than a flag + raising an error on init?
Ok, I've implemented this in Cython for now, to finally move the PEP-489 support forward. The somewhat annoying drawback is that module reloading previously *seemed* to work, simply because it didn't actually do anything. Now, people will get an exception in cases that previously worked silently. An exception would probably have been better from the beginning, because it clearly tells people that what they are trying is not supported. Now it's a bit of a breaking change. I'll see what it gives.
If the breakage turns out to cause problems, some potential ways to mitigate it would be: 1. Emulate the old "extension modules are singletons" behaviour in the module creation slot, and only emit a deprecation warning when it gets called multiple times rather than failing outright. 2. As for 1, but error out when the active interpreter changes (that will let a module support reloading, while explicitly erroring out if you attempt to load it from multiple subinterpreters) Both of those would need some level of PEP 3121 support to clear out the singleton state when the module gets outright destroyed, but they'd still provide a middle ground between the ill-defined behaviour of single-phase initialisation and full-fledged support multiple concurrently loaded independent copies of the same extension module. Under that interpretation, the required clarification in PEP 489 would be to replace the term "supports" with "has clearly defined and documented behaviour", and then make it clear that that defined behaviour is allowed to be "Fails with an exception explaining the limitation", or "manages some internal state as a process-level singleton". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (3)
-
Nick Coghlan
-
Petr Viktorin
-
Stefan Behnel