Exposing CPython's subinterpreter C-API in the stdlib.
Although I haven't been able to achieve the pace that I originally wanted, I have been able to work on my multi-core Python idea little-by-little. Most notably, some of the blockers have been resolved at the recent PyCon sprints and I'm ready to move onto the next step: exposing multiple interpreters via a stdlib module. Initially I just want to expose basic support via 3 successive changes. Below I've listed the corresponding (chained) PRs, along with what they add. Note that the 2 proposed modules take some cues from the threading module, but don't try to be any sort of replacement. Threading and subinterpreters are two different features that are used together rather than as alternatives to one another. At the very least I'd like to move forward with the _interpreters module sooner rather than later. Doing so will facilitate more extensive testing of subinterpreters, in preparation for further use of them in the multi-core Python project. We can iterate from there, but I'd at least like to get the basic functionality landed early. Any objections to (or feedback about) the low-level _interpreters module as described? Likewise for the high-level interpreters module? Discussion on any expanded functionality for the modules or on the broader topic of the multi-core project are both welcome, but please start other threads for those topics. -eric basic low-level API: https://github.com/python/cpython/pull/1748 _interpreters.create() -> id _interpreters.destroy(id) _interpreters.run_string(id, code) _interpreters.run_string_unrestricted(id, code, ns=None) -> ns extra low-level API: https://github.com/python/cpython/pull/1802 _interpreters.enumerate() -> [id, ...] _interpreters.get_current() -> id _interpreters.get_main() -> id _interpreters.is_running(id) -> bool basic high-level API: https://github.com/python/cpython/pull/1803 interpreters.enumerate() -> [Interpreter, ...] interpreters.get_current() -> Interpreter interpreters.get_main() -> Interpreter interpreters.create() -> Interpreter interpreters.Interpreter(id) interpreters.Interpreter.is_running() interpreters.Interpreter.destroy() interpreters.Interpreter.run(code)
CC'ing PyPy-dev... On Wed, May 24, 2017 at 6:01 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Although I haven't been able to achieve the pace that I originally wanted, I have been able to work on my multi-core Python idea little-by-little. Most notably, some of the blockers have been resolved at the recent PyCon sprints and I'm ready to move onto the next step: exposing multiple interpreters via a stdlib module.
Initially I just want to expose basic support via 3 successive changes. Below I've listed the corresponding (chained) PRs, along with what they add. Note that the 2 proposed modules take some cues from the threading module, but don't try to be any sort of replacement. Threading and subinterpreters are two different features that are used together rather than as alternatives to one another.
At the very least I'd like to move forward with the _interpreters module sooner rather than later. Doing so will facilitate more extensive testing of subinterpreters, in preparation for further use of them in the multi-core Python project. We can iterate from there, but I'd at least like to get the basic functionality landed early. Any objections to (or feedback about) the low-level _interpreters module as described? Likewise for the high-level interpreters module?
Discussion on any expanded functionality for the modules or on the broader topic of the multi-core project are both welcome, but please start other threads for those topics.
-eric
basic low-level API: https://github.com/python/cpython/pull/1748
_interpreters.create() -> id _interpreters.destroy(id) _interpreters.run_string(id, code) _interpreters.run_string_unrestricted(id, code, ns=None) -> ns
extra low-level API: https://github.com/python/cpython/pull/1802
_interpreters.enumerate() -> [id, ...] _interpreters.get_current() -> id _interpreters.get_main() -> id _interpreters.is_running(id) -> bool
basic high-level API: https://github.com/python/cpython/pull/1803
interpreters.enumerate() -> [Interpreter, ...] interpreters.get_current() -> Interpreter interpreters.get_main() -> Interpreter interpreters.create() -> Interpreter interpreters.Interpreter(id) interpreters.Interpreter.is_running() interpreters.Interpreter.destroy() interpreters.Interpreter.run(code) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Nathaniel J. Smith -- https://vorpus.org
Hm... Curiously, I've heard a few people at PyCon mention they thought subinterpreters were broken and not useful (and they share the GIL anyways) and should be taken out. So we should at least have clarity on which direction we want to take... On Wed, May 24, 2017 at 6:01 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Although I haven't been able to achieve the pace that I originally wanted, I have been able to work on my multi-core Python idea little-by-little. Most notably, some of the blockers have been resolved at the recent PyCon sprints and I'm ready to move onto the next step: exposing multiple interpreters via a stdlib module.
Initially I just want to expose basic support via 3 successive changes. Below I've listed the corresponding (chained) PRs, along with what they add. Note that the 2 proposed modules take some cues from the threading module, but don't try to be any sort of replacement. Threading and subinterpreters are two different features that are used together rather than as alternatives to one another.
At the very least I'd like to move forward with the _interpreters module sooner rather than later. Doing so will facilitate more extensive testing of subinterpreters, in preparation for further use of them in the multi-core Python project. We can iterate from there, but I'd at least like to get the basic functionality landed early. Any objections to (or feedback about) the low-level _interpreters module as described? Likewise for the high-level interpreters module?
Discussion on any expanded functionality for the modules or on the broader topic of the multi-core project are both welcome, but please start other threads for those topics.
-eric
basic low-level API: https://github.com/python/cpython/pull/1748
_interpreters.create() -> id _interpreters.destroy(id) _interpreters.run_string(id, code) _interpreters.run_string_unrestricted(id, code, ns=None) -> ns
extra low-level API: https://github.com/python/cpython/pull/1802
_interpreters.enumerate() -> [id, ...] _interpreters.get_current() -> id _interpreters.get_main() -> id _interpreters.is_running(id) -> bool
basic high-level API: https://github.com/python/cpython/pull/1803
interpreters.enumerate() -> [Interpreter, ...] interpreters.get_current() -> Interpreter interpreters.get_main() -> Interpreter interpreters.create() -> Interpreter interpreters.Interpreter(id) interpreters.Interpreter.is_running() interpreters.Interpreter.destroy() interpreters.Interpreter.run(code) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On 25 May 2017 at 13:30, Guido van Rossum <guido@python.org> wrote:
Hm... Curiously, I've heard a few people at PyCon mention they thought subinterpreters were broken and not useful (and they share the GIL anyways) and should be taken out.
Taking them out entirely would break mod_wsgi (and hence the use of Apache httpd as a Python application server), so I hope we don't consider going down that path :) As far as the GIL goes, Eric has a few ideas around potentially getting to a tiered locking approach, where the GIL becomes a Read/Write lock shared across the interpreters, and there are separate subinterpreter locks to guard actual code execution. That becomes a lot more feasible in a subinterpreter model, since the eval loop and various other structures are already separate - the tiered locking would mainly need to account for management of "object ownership" that prevented multiple interpreters from accessing the same object at the same time. However, I do think subinterpreters can be accurately characterised as fragile, especially in the presence of extension modules. I also think a large part of that fragility can be laid at the feet of them currently being incredibly difficult to test - while _testembed includes a rudimentary check [1] to make sure the subinterpreter machinery itself basically works, it doesn't do anything in the way of checking that the rest of the standard library actually does the right thing when run in a subinterpreter. So I'm +1 for the idea of exposing a low-level CPython-specific _interpreters API in order to start building out a proper test suite for the capability, and to let folks interested in working with them do so without having to write a custom embedding application ala mod_wsgi. However, I think it's still far too soon to be talking about defining a public supported API for them - while their use in mod_wsgi gives us assurance that they do mostly work in CPython, other implementations don't necessarily have anything comparable (even as a private implementation detail), and the kinds of software that folks run directly under mod_wsgi isn't necessarily reflective of the full extent of variation in the kinds of code that Python developers write in general. Cheers, Nick. [1] https://github.com/python/cpython/blob/master/Programs/_testembed.c#L41 -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Wed, May 24, 2017 at 8:30 PM, Guido van Rossum <guido@python.org> wrote:
Hm... Curiously, I've heard a few people at PyCon
I'd love to get in touch with them and discuss the situation. I've spoken with Graham Dumpleton on several occasions about subinterpreters and what needs to be fixed.
mention they thought subinterpreters were broken
There are a number of related long-standing bugs plus a few that I created in the last year or two. I'm motivated to get these resolved so that the multi-core Python project can take full advantage of subinterpreters without worry. As well, there are known limitations to using extension modules in subinterpreters. However, only extension modules that rely on process globals (rather than leveraging PEP 384, etc.) are affected, and we can control for that more carefully using the protocol introduced by PEP 489. There isn't anything broken about the concept or design of subinterpreters in CPython that I'm aware of.
and not useful (and they share the GIL anyways)
I'll argue that their usefulness has been limited by lack of exposure in the stdlib. :) Furthermore, I'm finding them extremely useful as the vehicle for the multi-core Python project.
and should be taken out. So we should at least have clarity on which direction we want to take...
I'd definitely appreciate a firm commitment that they are not getting removed as I don't want to spend a ton of time on the project just to have the effort made irrelevant. :) Also, I'd be surprised if there were sufficient merit to removing support for subinterpreters, since there is very little machinery just for that feature. Instead, it leverages components of CPython that are there for other valid reasons. So I do not consider subinterpreters to currently add any significant burden to maintenance or development of the code base. Regardless, exposing the low-level _subinterpreters module should help us iron out bugs and API, as Nick pointed out. -eric
Eric, Something like these subinterpreters in CPython are used from Jython's Java API. Like nearly all of Jython* this can be directly imported into using Python code, as seen in tests using this feature: https://github.com/ jythontools/jython/blob/master/Lib/test/test_pythoninterpreter_jy.py More on the API here: https://github.com/jythontools/jython/blob/ master/src/org/python/util/PythonInterpreter.java - note that is not even a core API for Jython, it just happens to be widely used, including by the launcher that wraps this API and calls itself the jython executable. So we can readily refactor if we have something better, because right now it is also problematic with respect to its lifecycle; what is the mapping to threads; and how it interacts with class loaders and other resources, especially during cleanup. It would be helpful to coordinate this subinterpreter work; or at least to cc jython-dev on such ideas your might have. Recently there have been some rumblings of consensus that it's about time for Jython to really start work on the 3.x implementation, targeting 3.6. But do be aware we are usually at most 2 to 5 developers, working in our spare time. So everything takes much longer than one would hope. I just hope we can finish 3.6 (or whatever) before Python 4.0 arrives :) *Excluding certain cases on core types where our bytecode rewriting makes it a true challenge! - Jim On Thu, May 25, 2017 at 11:03 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Wed, May 24, 2017 at 8:30 PM, Guido van Rossum <guido@python.org> wrote:
Hm... Curiously, I've heard a few people at PyCon
I'd love to get in touch with them and discuss the situation. I've spoken with Graham Dumpleton on several occasions about subinterpreters and what needs to be fixed.
mention they thought subinterpreters were broken
There are a number of related long-standing bugs plus a few that I created in the last year or two. I'm motivated to get these resolved so that the multi-core Python project can take full advantage of subinterpreters without worry.
As well, there are known limitations to using extension modules in subinterpreters. However, only extension modules that rely on process globals (rather than leveraging PEP 384, etc.) are affected, and we can control for that more carefully using the protocol introduced by PEP 489.
There isn't anything broken about the concept or design of subinterpreters in CPython that I'm aware of.
and not useful (and they share the GIL anyways)
I'll argue that their usefulness has been limited by lack of exposure in the stdlib. :) Furthermore, I'm finding them extremely useful as the vehicle for the multi-core Python project.
and should be taken out. So we should at least have clarity on which direction we want to take...
I'd definitely appreciate a firm commitment that they are not getting removed as I don't want to spend a ton of time on the project just to have the effort made irrelevant. :) Also, I'd be surprised if there were sufficient merit to removing support for subinterpreters, since there is very little machinery just for that feature. Instead, it leverages components of CPython that are there for other valid reasons. So I do not consider subinterpreters to currently add any significant burden to maintenance or development of the code base. Regardless, exposing the low-level _subinterpreters module should help us iron out bugs and API, as Nick pointed out.
-eric _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On May 24, 2017 20:31, "Guido van Rossum" <guido@python.org> wrote: Hm... Curiously, I've heard a few people at PyCon mention they thought subinterpreters were broken and not useful (and they share the GIL anyways) and should be taken out. So we should at least have clarity on which direction we want to take... My impression is that the code to support them inside CPython is fine, but they're broken and not very useful in the sense that lots of C extensions don't really support them, so in practice you can't reliably use them to run arbitrary code. Numpy for example definitely has lots of subinterpreter-related bugs, and when they get reported we close them as WONTFIX. Based on conversations at last year's pycon, my impression is that numpy probably *could* support subinterpreters (i.e. the required apis exist), but none of us really understand the details, it's the kind of problem that requires a careful whole-codebase audit, and a naive approach might make numpy's code slower and more complicated for everyone. (For example, there are lots of places where numpy keeps a little global cache that I guess should instead be per-subinterpreter caches, which would mean adding an extra lookup operation to fast paths.) Or maybe it'd be fine, but no one is motivated to figure it out, because the other side of the cost/benefit analysis is that almost nobody actually uses subinterpreters. I think the only two projects that do are mod_wsgi and jep [1]. So yeah, the status quo is broken. But there are two possible ways to fix it: IMHO either subinterpreters should be removed *or* they should have some compelling features added to make them actually worth the effort of fixing c extensions to support them. If Eric can pull off this multi-core idea then that would be pretty compelling :-). (And my impression is that the things that break under subinterpreters are essentially the same as would break under any GIL-removal plan.) The problem is that we don't actually know yet whether the multi-core idea will work, so it seems like a bad time to double down on committing to subinterpreter support and pressuring C extensions to keep up. Eric- do you have a plan written down somewhere? I'm wondering what the critical path from here to a multi-core proof of concept looks like. -n [1] https://github.com/mrj0/jep/wiki/How-Jep-Works
On Thu, 25 May 2017 at 08:06 Nick Coghlan <ncoghlan@gmail.com> wrote:
On 25 May 2017 at 13:30, Guido van Rossum <guido@python.org> wrote:
Hm... Curiously, I've heard a few people at PyCon mention they thought subinterpreters were broken and not useful (and they share the GIL anyways) and should be taken out.
Taking them out entirely would break mod_wsgi (and hence the use of Apache httpd as a Python application server), so I hope we don't consider going down that path :)
As far as the GIL goes, Eric has a few ideas around potentially getting to a tiered locking approach, where the GIL becomes a Read/Write lock shared across the interpreters, and there are separate subinterpreter locks to guard actual code execution. That becomes a lot more feasible in a subinterpreter model, since the eval loop and various other structures are already separate - the tiered locking would mainly need to account for management of "object ownership" that prevented multiple interpreters from accessing the same object at the same time.
However, I do think subinterpreters can be accurately characterised as fragile, especially in the presence of extension modules. I also think a large part of that fragility can be laid at the feet of them currently being incredibly difficult to test - while _testembed includes a rudimentary check [1] to make sure the subinterpreter machinery itself basically works, it doesn't do anything in the way of checking that the rest of the standard library actually does the right thing when run in a subinterpreter.
So I'm +1 for the idea of exposing a low-level CPython-specific _interpreters API in order to start building out a proper test suite for the capability, and to let folks interested in working with them do so without having to write a custom embedding application ala mod_wsgi.
However, I think it's still far too soon to be talking about defining a public supported API for them - while their use in mod_wsgi gives us assurance that they do mostly work in CPython, other implementations don't necessarily have anything comparable (even as a private implementation detail), and the kinds of software that folks run directly under mod_wsgi isn't necessarily reflective of the full extent of variation in the kinds of code that Python developers write in general.
I'm +1 on Nick's idea of the low-level, private API existing first to facilitate testing, but putting off any public API until we're sure we can make it function in a way we're happy with to more generally expose.
On Thu, May 25, 2017 at 11:19 AM, Nathaniel Smith <njs@pobox.com> wrote:
My impression is that the code to support them inside CPython is fine, but they're broken and not very useful in the sense that lots of C extensions don't really support them, so in practice you can't reliably use them to run arbitrary code. Numpy for example definitely has lots of subinterpreter-related bugs, and when they get reported we close them as WONTFIX.
Based on conversations at last year's pycon, my impression is that numpy probably *could* support subinterpreters (i.e. the required apis exist), but none of us really understand the details, it's the kind of problem that requires a careful whole-codebase audit, and a naive approach might make numpy's code slower and more complicated for everyone. (For example, there are lots of places where numpy keeps a little global cache that I guess should instead be per-subinterpreter caches, which would mean adding an extra lookup operation to fast paths.)
Thanks for pointing this out. You've clearly described probably the biggest challenge for folks that try to use subinterpreters. PEP 384 was meant to help with this, but seems to have fallen short. PEP 489 can help identify modules that profess subinterpreter support, as well as facilitating future extension module helpers to deal with global state. However, I agree that *right now* getting extension modules to reliably work with subinterpreters is not easy enough. Furthermore, that won't change unless there is sufficient benefit tied to subinterpreters, as you point out below.
Or maybe it'd be fine, but no one is motivated to figure it out, because the other side of the cost/benefit analysis is that almost nobody actually uses subinterpreters. I think the only two projects that do are mod_wsgi and jep [1].
So yeah, the status quo is broken. But there are two possible ways to fix it: IMHO either subinterpreters should be removed *or* they should have some compelling features added to make them actually worth the effort of fixing c extensions to support them. If Eric can pull off this multi-core idea then that would be pretty compelling :-).
Agreed. :)
(And my impression is that the things that break under subinterpreters are essentially the same as would break under any GIL-removal plan.)
More or less. There's a lot of process-global state in CPython that needs to get pulled into the interpreter state. So in that regard the effort and tooling will likely correspond fairly closely with what extension modules have to do.
The problem is that we don't actually know yet whether the multi-core idea will work, so it seems like a bad time to double down on committing to subinterpreter support and pressuring C extensions to keep up. Eric- do you have a plan written down somewhere? I'm wondering what the critical path from here to a multi-core proof of concept looks like.
Probably the best summary is here: http://ericsnowcurrently.blogspot.com/2016/09/solving-mutli-core-python.html The caveat is that doing this myself is slow-going due to persistent lack of time. :/ So any timely solution would require effort from more people. I've had enough positive responses from folks at PyCon that I think enough people would pitch in to get it done in a timely manner. More significantly, I genuinely believe that isolated interpreters in the same process is a tool that many people will find extremely useful and will help the Python community. Consequently, exposing subinterpreters in the stdlib would result in a stronger incentive for folks to fix the known bugs and find a solution to the challenges for extension modules. -eric
On Thu, May 25, 2017 at 11:55 AM, Brett Cannon <brett@python.org> wrote:
I'm +1 on Nick's idea of the low-level, private API existing first to facilitate testing, but putting off any public API until we're sure we can make it function in a way we're happy with to more generally expose.
Same here. I hadn't expected the high-level API to be an immediate (or contingent) addition. My interest lies particularly with the low-level module. -eric
On 25 May 2017 at 20:01, Eric Snow <ericsnowcurrently@gmail.com> wrote:
More significantly, I genuinely believe that isolated interpreters in the same process is a tool that many people will find extremely useful and will help the Python community. Consequently, exposing subinterpreters in the stdlib would result in a stronger incentive for folks to fix the known bugs and find a solution to the challenges for extension modules.
I'm definitely interested in subinterpreter support. I don't have a specific use case for it, but I see it as an enabling technology that could be used in creative ways (even given the current limitations involved in extension support). Perl has had subinterpreter support for many years - it's the implementation technique behind their fork primitive on Windows (on Unix, real fork is used) and allows many common patterns of use of fork to be ported to Windows. Python doesn't really have a need for this, as fork is not commonly used here (we use threads or multiprocessing where Perl would historically have used fork), but nevertheless it does provide prior art in this area. Paul
On Thu, May 25, 2017 at 12:01 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
More significantly, I genuinely believe that isolated interpreters in the same process is a tool that many people will find extremely useful and will help the Python community. Consequently, exposing subinterpreters in the stdlib would result in a stronger incentive for folks to fix the known bugs and find a solution to the challenges for extension modules.
I feel like the most effective incentive would be to demonstrate how useful they are first? If we do it in the other order, then there's a risk that cpython does provide an incentive, but it's of the form "this thing doesn't actually accomplish anything useful yet, but it got mentioned in whats-new-in-3.7 and now angry people are yelling at me in my bug tracker for not 'fixing' my package, so I have to do a bunch of pointless work to shut them up". This tends to leave bad feelings all around. I do get that this is a tricky chicken-and-egg situation: currently subinterpreters don't work very well, so no-one writes cool applications using them, so no-one bothers to make them work better. And I share the general intuition that this is a powerful tool that probably has some kind of useful applications. But I can't immediately name any such applications, which makes me nervous :-). The obvious application is your multi-core Python idea, and I think that would convince a lot of people; in general I'm enthusiastic about the idea of extending Python's semantics to enable better parallelism. But I just re-read your blog post and some of the linked thread, and it's not at all clear to me how you plan to solve the refcounting and garbage collection problems that will arise once you have objects that are shared between multiple subinterpreters and no GIL. Which makes it hard for me to make a case to the other numpy devs that it's worth spending energy on this now, to support a feature that might or might not happen in the future, especially if angry shouty people start joining the conversation. Does that make sense? I want the project to succeed, and if one of the conditions for that is getting buy-in from the community of C extension developers then it seems important to have a good plan for navigating the incentives tightrope. -n -- Nathaniel J. Smith -- https://vorpus.org
On 25 May 2017, at 19:03, Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Wed, May 24, 2017 at 8:30 PM, Guido van Rossum <guido@python.org> wrote:
Hm... Curiously, I've heard a few people at PyCon
I'd love to get in touch with them and discuss the situation. I've spoken with Graham Dumpleton on several occasions about subinterpreters and what needs to be fixed.
mention they thought subinterpreters were broken
There are a number of related long-standing bugs plus a few that I created in the last year or two. I'm motivated to get these resolved so that the multi-core Python project can take full advantage of subinterpreters without worry.
As well, there are known limitations to using extension modules in subinterpreters. However, only extension modules that rely on process globals (rather than leveraging PEP 384, etc.) are affected, and we can control for that more carefully using the protocol introduced by PEP 489.
There also the PyGILState APIs (PEP 311), those assume there’s only one interpreter. Ronald
On 05/25/2017 09:01 PM, Eric Snow wrote:
On Thu, May 25, 2017 at 11:19 AM, Nathaniel Smith <njs@pobox.com> wrote:
My impression is that the code to support them inside CPython is fine, but they're broken and not very useful in the sense that lots of C extensions don't really support them, so in practice you can't reliably use them to run arbitrary code. Numpy for example definitely has lots of subinterpreter-related bugs, and when they get reported we close them as WONTFIX.
Based on conversations at last year's pycon, my impression is that numpy probably *could* support subinterpreters (i.e. the required apis exist), but none of us really understand the details, it's the kind of problem that requires a careful whole-codebase audit, and a naive approach might make numpy's code slower and more complicated for everyone. (For example, there are lots of places where numpy keeps a little global cache that I guess should instead be per-subinterpreter caches, which would mean adding an extra lookup operation to fast paths.)
Thanks for pointing this out. You've clearly described probably the biggest challenge for folks that try to use subinterpreters. PEP 384 was meant to help with this, but seems to have fallen short. PEP 489 can help identify modules that profess subinterpreter support, as well as facilitating future extension module helpers to deal with global state. However, I agree that *right now* getting extension modules to reliably work with subinterpreters is not easy enough. Furthermore, that won't change unless there is sufficient benefit tied to subinterpreters, as you point out below.
PEP 489 was a first step; the work is not finished. The next step is solving a major reason people are using global state in extension modules: per-module state isn't accessible from all the places it should be, namely in methods of classes. In other words, I don't think Python is ready for big projects like Numpy to start properly supporting subinterpreters. The work on fixing this has stalled, but it looks like I'll be getting back on track. Discussions about this are on the import-sig list, reach out there if you'd like to help.
Hi all, Personally I feel that the current subinterpreter support falls short in the sense that it still requires a single GIL across interpreters. If interpreters would have their own individual GIL, we could have true shared-nothing multi-threaded support similar to Javascript's "Web Workers". Here is a point-wise overview of what I am imagining. I realize the following is very ambitious, but I would like to bring it to your consideration. 1. Multiple interpreters can be instantiated, each of which is completely independent. To this end, all global interpreter state needs to go into an interpreter strucutre, including the GIL (which becomes per-interpreter) Interpreters share no state whatsoever. 2. PyObject's are tied to a particular interpreter and cannot be shared between interpreters. (This is because each interpreter now has its own GIL.) I imagine a special debug build would actually store the interpreter pointer in the PyObject and would assert everywhere that the PyObject is only manipulated by its owning interpreter. 3. Practically all existing APIs, including Py_INCREF and Py_DECREF, need to get an additional explicit interpreter argument. I imagine that we would have a new prefix, say MPy_, because the existing APIs must be left for backward compatibility. 4. At most one interpreter can be designated the "main" interpreter. This is for backward compatibility of existing extension modules ONLY. All the existing Py_* APIs operate implicitly on this main interpreter. 5. Extension modules need to explicitly advertise multiple interpreter support. If they don't, they can only be imported in the main interpreter. However, in that case they can safely use the existing Py_ APIs. 6. Since PyObject's cannot be shared across interpreters, there needs to be an explicit function which takes a PyObject in interpreter A and constructs a similar object in interpreter B. Conceptually this would be equivalent to pickling in A and unpickling in B, but presumably more efficient. It would use the copyreg registry in a similar way to pickle. 7. Extension modules would also be able to register their function for copying custom types across interpreters . That would allow extension modules to provide custom types where the underlying C object is in fact not copied but shared between interpreters. I would imagine we would have a"shared memory" memoryview object and also Mutex and other locking constructs which would work across interpreters. 8. Finally, the main application: functionality similar to the current `multiprocessing' module, but with multiple interpreters on multiple threads in a single process. This would presumably be more efficient than `multiprocessing' and also allow extra functionality, since the underlying C objects can in fact be shared. (Imagine two interpreters operating in parallel on a single OpenCL context.) Stephan Op 26 mei 2017 10:41 a.m. schreef "Petr Viktorin" <encukou@gmail.com>:
On 05/25/2017 09:01 PM, Eric Snow wrote:
On Thu, May 25, 2017 at 11:19 AM, Nathaniel Smith <njs@pobox.com> wrote:
My impression is that the code to support them inside CPython is fine, but they're broken and not very useful in the sense that lots of C extensions don't really support them, so in practice you can't reliably use them to run arbitrary code. Numpy for example definitely has lots of subinterpreter-related bugs, and when they get reported we close them as WONTFIX.
Based on conversations at last year's pycon, my impression is that numpy probably *could* support subinterpreters (i.e. the required apis exist), but none of us really understand the details, it's the kind of problem that requires a careful whole-codebase audit, and a naive approach might make numpy's code slower and more complicated for everyone. (For example, there are lots of places where numpy keeps a little global cache that I guess should instead be per-subinterpreter caches, which would mean adding an extra lookup operation to fast paths.)
Thanks for pointing this out. You've clearly described probably the biggest challenge for folks that try to use subinterpreters. PEP 384 was meant to help with this, but seems to have fallen short. PEP 489 can help identify modules that profess subinterpreter support, as well as facilitating future extension module helpers to deal with global state. However, I agree that *right now* getting extension modules to reliably work with subinterpreters is not easy enough. Furthermore, that won't change unless there is sufficient benefit tied to subinterpreters, as you point out below.
PEP 489 was a first step; the work is not finished. The next step is solving a major reason people are using global state in extension modules: per-module state isn't accessible from all the places it should be, namely in methods of classes. In other words, I don't think Python is ready for big projects like Numpy to start properly supporting subinterpreters.
The work on fixing this has stalled, but it looks like I'll be getting back on track. Discussions about this are on the import-sig list, reach out there if you'd like to help. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 26 May 2017 at 22:08, Stephan Houben <stephanh42@gmail.com> wrote:
Hi all,
Personally I feel that the current subinterpreter support falls short in the sense that it still requires a single GIL across interpreters.
If interpreters would have their own individual GIL, we could have true shared-nothing multi-threaded support similar to Javascript's "Web Workers".
Here is a point-wise overview of what I am imagining. I realize the following is very ambitious, but I would like to bring it to your consideration.
1. Multiple interpreters can be instantiated, each of which is completely independent. To this end, all global interpreter state needs to go into an interpreter strucutre, including the GIL (which becomes per-interpreter) Interpreters share no state whatsoever.
There'd still be true process global state (i.e. anything managed by the C runtime), so this would be a tiered setup with a read/write GIL and multiple SILs. For the time being though, a single GIL remains much easier to manage.
2. PyObject's are tied to a particular interpreter and cannot be shared between interpreters. (This is because each interpreter now has its own GIL.) I imagine a special debug build would actually store the interpreter pointer in the PyObject and would assert everywhere that the PyObject is only manipulated by its owning interpreter.
Yes, something like Rust's ownership model is the gist of what we had in mind (i.e. allowing zero-copy transfer of ownership between subinterpreters, but only the owning interpreter is allowed to do anything else with the object).
3. Practically all existing APIs, including Py_INCREF and Py_DECREF, need to get an additional explicit interpreter argument. I imagine that we would have a new prefix, say MPy_, because the existing APIs must be left for backward compatibility.
This isn't necessary, as the active interpreter is already tracked as part of the thread local state (otherwise mod_wsgi et al wouldn't work at all).
4. At most one interpreter can be designated the "main" interpreter. This is for backward compatibility of existing extension modules ONLY. All the existing Py_* APIs operate implicitly on this main interpreter.
Yep, this is part of the concept. The PEP 432 draft has more details on that: https://www.python.org/dev/peps/pep-0432/#interpreter-initialization-phases
5. Extension modules need to explicitly advertise multiple interpreter support. If they don't, they can only be imported in the main interpreter. However, in that case they can safely use the existing Py_ APIs.
This is the direction we started moving the with multi-phase initialisation PEP for extension modules: https://www.python.org/dev/peps/pep-0489/ As Petr noted, the main missing piece there now is the fact that object methods (as opposed to module level functions) implemented in C currently don't have ready access to the module level state for the modules where they're defined.
6. Since PyObject's cannot be shared across interpreters, there needs to be an explicit function which takes a PyObject in interpreter A and constructs a similar object in interpreter B.
Conceptually this would be equivalent to pickling in A and unpickling in B, but presumably more efficient. It would use the copyreg registry in a similar way to pickle.
This would be an ownership transfer rather than a copy (which carries the implication that all the subinterpreters would still need to share a common memory allocator)
7. Extension modules would also be able to register their function for copying custom types across interpreters . That would allow extension modules to provide custom types where the underlying C object is in fact not copied but shared between interpreters. I would imagine we would have a"shared memory" memoryview object and also Mutex and other locking constructs which would work across interpreters.
We generally don't expect this to be needed given an ownership focused approach. Instead, the focus would be on enabling efficient channel based communication models that are cost-prohibitive when object serialisation is involved.
8. Finally, the main application: functionality similar to the current `multiprocessing' module, but with multiple interpreters on multiple threads in a single process. This would presumably be more efficient than `multiprocessing' and also allow extra functionality, since the underlying C objects can in fact be shared. (Imagine two interpreters operating in parallel on a single OpenCL context.)
We're not sure how feasible it will be to enable this in general, but even without it, zero-copy ownership transfers enable a *lot* of interest concurrency models that Python doesn't currently offer great primitives to support (they're mainly a matter of using threads in certain ways, which means they not only run afoul of the GIL, but you also don't get any assistance from the interpreter in strictly enforcing object ownership rules). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Hi Nick, As far as I understand, the (to me) essential difference between your approach and my proposal is that: Approach 1 (PEP-489): * Single (global) GIL. * PyObject's may be shared across interpreters (zero-copy transfer) Approach 2 (mine) * Per-interpreter GIL. * PyObject's must be copied across interpreters. To me, the per-interpreter GIL is the essential "target" I am aiming for, and I am willing to sacrifice the zero-copy for that. If the GIL is still shared then I don't see much advantage of this approach over just using the "threading" module with a single interpreter. (I realize it still gives you some isolation between interpreters. To me personally this is not very interesting, but this may be myopic.)
For the time being though, a single GIL remains much easier to manage.
"For the time being" suggests that you are intending approach 1 to be ultimately a stepping stone to something similar to approach 2?
Yes, something like Rust's ownership model is the gist of what we had in mind (i.e. allowing zero-copy transfer of ownership between subinterpreters, but only the owning interpreter is allowed to do anything else with the object).
This can be emulated in approach 2 by creating a wrapper C-level type which contains a PyObject and its corresponding interpreter. So that interpreter A can reference an object in interpreter B.
3. Practically all existing APIs, including Py_INCREF and Py_DECREF, need to get an additional explicit interpreter argument. I imagine that we would have a new prefix, say MPy_, because the existing APIs must be left for backward compatibility.
This isn't necessary, as the active interpreter is already tracked as part of the thread local state (otherwise mod_wsgi et al wouldn't work at all).
I realize that it is possible to that it that way. However this has some disadvantages: * The interpreter becomes tied to a thread, or you need to have some way to switch interpeters on a thread. (Which makes your code look like OpenGL code;-) ) * Once you are going to write code which manipulates objects in multiple interpreters (e.g. my proposed copy function or the "foreign interpreter wrapper" I discussed above) making the interpreter explicit probably avoids headaches. * Explicit is better than implicit, as somebody once said. ;-) Stephan 2017-05-26 15:17 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
On 26 May 2017 at 22:08, Stephan Houben <stephanh42@gmail.com> wrote:
Hi all,
Personally I feel that the current subinterpreter support falls short in the sense that it still requires a single GIL across interpreters.
If interpreters would have their own individual GIL, we could have true shared-nothing multi-threaded support similar to Javascript's "Web Workers".
Here is a point-wise overview of what I am imagining. I realize the following is very ambitious, but I would like to bring it to your consideration.
1. Multiple interpreters can be instantiated, each of which is completely independent. To this end, all global interpreter state needs to go into an interpreter strucutre, including the GIL (which becomes per-interpreter) Interpreters share no state whatsoever.
There'd still be true process global state (i.e. anything managed by the C runtime), so this would be a tiered setup with a read/write GIL and multiple SILs. For the time being though, a single GIL remains much easier to manage.
2. PyObject's are tied to a particular interpreter and cannot be shared between interpreters. (This is because each interpreter now has its own GIL.) I imagine a special debug build would actually store the interpreter pointer in the PyObject and would assert everywhere that the PyObject is only manipulated by its owning interpreter.
Yes, something like Rust's ownership model is the gist of what we had in mind (i.e. allowing zero-copy transfer of ownership between subinterpreters, but only the owning interpreter is allowed to do anything else with the object).
3. Practically all existing APIs, including Py_INCREF and Py_DECREF, need to get an additional explicit interpreter argument. I imagine that we would have a new prefix, say MPy_, because the existing APIs must be left for backward compatibility.
This isn't necessary, as the active interpreter is already tracked as part of the thread local state (otherwise mod_wsgi et al wouldn't work at all).
4. At most one interpreter can be designated the "main" interpreter. This is for backward compatibility of existing extension modules ONLY. All the existing Py_* APIs operate implicitly on this main interpreter.
Yep, this is part of the concept. The PEP 432 draft has more details on that: https://www.python.org/dev/peps/pep-0432/#interpreter-initialization-phases
5. Extension modules need to explicitly advertise multiple interpreter support. If they don't, they can only be imported in the main interpreter. However, in that case they can safely use the existing Py_ APIs.
This is the direction we started moving the with multi-phase initialisation PEP for extension modules: https://www.python.org/dev/peps/pep-0489/
As Petr noted, the main missing piece there now is the fact that object methods (as opposed to module level functions) implemented in C currently don't have ready access to the module level state for the modules where they're defined.
6. Since PyObject's cannot be shared across interpreters, there needs to be an explicit function which takes a PyObject in interpreter A and constructs a similar object in interpreter B.
Conceptually this would be equivalent to pickling in A and unpickling in B, but presumably more efficient. It would use the copyreg registry in a similar way to pickle.
This would be an ownership transfer rather than a copy (which carries the implication that all the subinterpreters would still need to share a common memory allocator)
7. Extension modules would also be able to register their function for copying custom types across interpreters . That would allow extension modules to provide custom types where the underlying C object is in fact not copied but shared between interpreters. I would imagine we would have a"shared memory" memoryview object and also Mutex and other locking constructs which would work across interpreters.
We generally don't expect this to be needed given an ownership focused approach. Instead, the focus would be on enabling efficient channel based communication models that are cost-prohibitive when object serialisation is involved.
8. Finally, the main application: functionality similar to the current `multiprocessing' module, but with multiple interpreters on multiple threads in a single process. This would presumably be more efficient than `multiprocessing' and also allow extra functionality, since the underlying C objects can in fact be shared. (Imagine two interpreters operating in parallel on a single OpenCL context.)
We're not sure how feasible it will be to enable this in general, but even without it, zero-copy ownership transfers enable a *lot* of interest concurrency models that Python doesn't currently offer great primitives to support (they're mainly a matter of using threads in certain ways, which means they not only run afoul of the GIL, but you also don't get any assistance from the interpreter in strictly enforcing object ownership rules).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 26 May 2017 at 23:49, Stephan Houben <stephanh42@gmail.com> wrote:
Hi Nick,
As far as I understand, the (to me) essential difference between your approach and my proposal is that:
Approach 1 (PEP-489): * Single (global) GIL. * PyObject's may be shared across interpreters (zero-copy transfer)
Approach 2 (mine) * Per-interpreter GIL. * PyObject's must be copied across interpreters.
To me, the per-interpreter GIL is the essential "target" I am aiming for, and I am willing to sacrifice the zero-copy for that.
Err, no - I explicitly said that assuming the rest of idea works out well, we'd eventually like to move to a tiered model where the GIL becomes a read/write lock. Most code execution in subinterpreters would then only need a read lock on the GIL, and hence could happily execute code in parallel with other subinterpreters running on other cores. However, that aspect of the idea is currently just hypothetical handwaving that would need to deal with (and would be informed by) the current work happening with respect to the GILectomy, as it's not particularly interesting as far as concurrency modeling is concerned. By contrast, being able to reliably model Communicating Sequential Processes in Python without incurring any communications overhead though (ala goroutines)? Or doing the same with the Actor model (ala Erlang/BEAM processes)? Those are *very* interesting language design concepts, and something where offering a compelling alternative to the current practices of emulating them with threads or coroutines pretty much requires the property of zero-copy ownership transfer. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, May 26, 2017 at 8:28 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
[...] assuming the rest of idea works out well, we'd eventually like to move to a tiered model where the GIL becomes a read/write lock. Most code execution in subinterpreters would then only need a read lock on the GIL, and hence could happily execute code in parallel with other subinterpreters running on other cores.
Since the GIL protects refcounts and refcounts are probably the most frequently written item, I'm skeptical of this.
However, that aspect of the idea is currently just hypothetical handwaving that would need to deal with (and would be informed by) the current work happening with respect to the GILectomy, as it's not particularly interesting as far as concurrency modeling is concerned.
By contrast, being able to reliably model Communicating Sequential Processes in Python without incurring any communications overhead though (ala goroutines)? Or doing the same with the Actor model (ala Erlang/BEAM processes)?
Those are *very* interesting language design concepts, and something where offering a compelling alternative to the current practices of emulating them with threads or coroutines pretty much requires the property of zero-copy ownership transfer.
But subinterpreters (which have independent sys.modules dicts) seem a poor match for that. It feels as if you're speculating about an entirely different language here, not named Python. -- --Guido van Rossum (python.org/~guido)
On 27 May 2017 at 03:30, Guido van Rossum <guido@python.org> wrote:
On Fri, May 26, 2017 at 8:28 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
[...] assuming the rest of idea works out well, we'd eventually like to move to a tiered model where the GIL becomes a read/write lock. Most code execution in subinterpreters would then only need a read lock on the GIL, and hence could happily execute code in parallel with other subinterpreters running on other cores.
Since the GIL protects refcounts and refcounts are probably the most frequently written item, I'm skeptical of this.
Likewise - hence my somewhat garbled attempt to explain that actually doing that would be contingent on the GILectomy folks figuring out some clever way to cope with the refcounts :)
By contrast, being able to reliably model Communicating Sequential Processes in Python without incurring any communications overhead though (ala goroutines)? Or doing the same with the Actor model (ala Erlang/BEAM processes)?
Those are *very* interesting language design concepts, and something where offering a compelling alternative to the current practices of emulating them with threads or coroutines pretty much requires the property of zero-copy ownership transfer.
But subinterpreters (which have independent sys.modules dicts) seem a poor match for that. It feels as if you're speculating about an entirely different language here, not named Python.
Ah, you're right - the types are all going to be separate as well, which means "cost of a deep copy" is the cheapest we're going to be able to get with this model. Anything better than that would require a more esoteric memory management architecture like the one in PyParallel. I guess I'll have to scale back my hopes on that front to be closer to what Stephan described - even a deep copy equivalent is often going to be cheaper than a full serialise/transmit/deserialise cycle or some other form of inter-process communication. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Hi Nick,
I guess I'll have to scale back my hopes on that front to be closer to what Stephan described - even a deep copy equivalent is often going to be cheaper than a full serialise/transmit/deserialise cycle or some other form of inter-process communication.
I would like to add that in many cases the underlying C objects *could* be shared. I identified some possible use cases of this. 1. numpy/scipy: share underlying memory of ndarray Effectively threads can then operate on the same array without GIL interference. 2. Sqlite in-memory database Multiple threads can operate on it in parallel. If you have an ORM it might feel very similar to just sharing Python objects across threads. 3. Tree of XML elements (like xml.etree) Assuming the tree data structure itself is in C, the tree could be shared across interpreters. This would be an example of a "deep" datastructure which can still be efficiently shared. So I feel this could still be very useful even if pure-Python objects need to be copied. Thanks, Stephan 2017-05-27 9:32 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
On 27 May 2017 at 03:30, Guido van Rossum <guido@python.org> wrote:
On Fri, May 26, 2017 at 8:28 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
[...] assuming the rest of idea works out well, we'd eventually like to move to a tiered model where the GIL becomes a read/write lock. Most code execution in subinterpreters would then only need a read lock on the GIL, and hence could happily execute code in parallel with other subinterpreters running on other cores.
Since the GIL protects refcounts and refcounts are probably the most frequently written item, I'm skeptical of this.
Likewise - hence my somewhat garbled attempt to explain that actually doing that would be contingent on the GILectomy folks figuring out some clever way to cope with the refcounts :)
By contrast, being able to reliably model Communicating Sequential Processes in Python without incurring any communications overhead though (ala goroutines)? Or doing the same with the Actor model (ala Erlang/BEAM processes)?
Those are *very* interesting language design concepts, and something where offering a compelling alternative to the current practices of emulating them with threads or coroutines pretty much requires the property of zero-copy ownership transfer.
But subinterpreters (which have independent sys.modules dicts) seem a poor match for that. It feels as if you're speculating about an entirely different language here, not named Python.
Ah, you're right - the types are all going to be separate as well, which means "cost of a deep copy" is the cheapest we're going to be able to get with this model. Anything better than that would require a more esoteric memory management architecture like the one in PyParallel.
I guess I'll have to scale back my hopes on that front to be closer to what Stephan described - even a deep copy equivalent is often going to be cheaper than a full serialise/transmit/deserialise cycle or some other form of inter-process communication.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (10)
-
Brett Cannon
-
Eric Snow
-
Guido van Rossum
-
Jim Baker
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Petr Viktorin
-
Ronald Oussoren
-
Stephan Houben