Experimental isolated subinterpreters
Hi,
tl; dr I'm asking for your permission to merge the following PR :-)
https://github.com/python/cpython/pull/19958
In bpo-40514, I added a new --with-experimental-isolated-subinterpreters configuration option. I chose to use a very long option name and to not document it on purpose: prevent users to use it "by mistake", without understanding its purpose. The option is related to "per-interpreter GIL":
https://bugs.python.org/issue40512
It's a practical solution to be able to experiment quickly per-interpreter GIL without having to fix all issues at once. For example, it disables Unicode interned strings which is unsafe with multiple interpreters running in parallel.
I added this #ifdef to encourage other core developers work on this project, and let early adopters test this experimental feature to give us their feedback.
I hope that soon we will discover all places which need to be fixed, so it will help to better estimate how much work is needed to finish the implementation of per-interpreter GIL.
Currently, the special build changes:
- Per-interpreter GIL
- Store the current Python thread state in a TLS
- Disable dict, frame, tuple and list free list
- Disable type method cache
- Disable pymalloc: force usage of libc malloc
- Disable the GC in subinterpreters
- _xxsubinterpreters.run_string() releases the GIL
I consider that's a reasonable small number of changes to get a cool feature (per-interpreter GIL), compared to the same of the CPython code base (637K lines).
--
All these "#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" are temporary workarounds until a proper fix is designed. For example, some caches should be made "per-interpreter".
Most changes are easy to write, but some other changes are non trivial. For example, I modified _PyThreadState_GET() and _PyThreadState_Swap() to use a Thread Local Storage (TLS) to get and set the current Python thread state.
Currently, "#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" are not "visible" to users: it is only used in .c files and a few internal header files. But the following PR modify the public Include/object.h header file to make PyObject.ob_refcnt atomic:
https://github.com/python/cpython/pull/19958
I would like to ensure that you are ok to put a few more temporary "#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" in CPython to speedup the development of subinterpreters running in parallel (per-interpreter GIL), or if you consider that it has gone too far and a Git fork would be better.
Victor
Night gathers, and now my watch begins. It shall not end until my death.
Can we wait until after 3.10 development opens up? And could it be a -X
flag?
Yes, it can wait until 3.9 branch is created and master becomes the future 3.10.
Victor
Le mer. 6 mai 2020 à 21:40, Brett Cannon <brett@python.org> a écrit :
Can we wait until after 3.10 development opens up? And could it be a
-X
flag?
python-committers mailing list -- python-committers@python.org To unsubscribe send an email to python-committers-leave@python.org https://mail.python.org/mailman3/lists/python-committers.python.org/ Message archived at https://mail.python.org/archives/list/python-committers@python.org/message/V... Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
I am moving home this week and do not have time to review.
But I want to notice about the difference between --enable-X and --with-X. See https://autotools.io/autoconf/arguments.html
Since isolated subinterpreter is not external dependency, "--enable" should be used here.
(I know we already abuse "--with-lto".)
Regards,
On Thu, May 7, 2020 at 2:50 AM Victor Stinner <vstinner@python.org> wrote:
Hi,
tl; dr I'm asking for your permission to merge the following PR :-)
https://github.com/python/cpython/pull/19958
In bpo-40514, I added a new --with-experimental-isolated-subinterpreters configuration option. I chose to use a very long option name and to not document it on purpose: prevent users to use it "by mistake", without understanding its purpose. The option is related to "per-interpreter GIL":
https://bugs.python.org/issue40512
It's a practical solution to be able to experiment quickly per-interpreter GIL without having to fix all issues at once. For example, it disables Unicode interned strings which is unsafe with multiple interpreters running in parallel.
I added this #ifdef to encourage other core developers work on this project, and let early adopters test this experimental feature to give us their feedback.
I hope that soon we will discover all places which need to be fixed, so it will help to better estimate how much work is needed to finish the implementation of per-interpreter GIL.
Currently, the special build changes:
- Per-interpreter GIL
- Store the current Python thread state in a TLS
- Disable dict, frame, tuple and list free list
- Disable type method cache
- Disable pymalloc: force usage of libc malloc
- Disable the GC in subinterpreters
- _xxsubinterpreters.run_string() releases the GIL
I consider that's a reasonable small number of changes to get a cool feature (per-interpreter GIL), compared to the same of the CPython code base (637K lines).
--
All these "#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" are temporary workarounds until a proper fix is designed. For example, some caches should be made "per-interpreter".
Most changes are easy to write, but some other changes are non trivial. For example, I modified _PyThreadState_GET() and _PyThreadState_Swap() to use a Thread Local Storage (TLS) to get and set the current Python thread state.
Currently, "#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" are not "visible" to users: it is only used in .c files and a few internal header files. But the following PR modify the public Include/object.h header file to make PyObject.ob_refcnt atomic:
https://github.com/python/cpython/pull/19958
I would like to ensure that you are ok to put a few more temporary "#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" in CPython to speedup the development of subinterpreters running in parallel (per-interpreter GIL), or if you consider that it has gone too far and a Git fork would be better.
Victor
Night gathers, and now my watch begins. It shall not end until my death.
python-committers mailing list -- python-committers@python.org To unsubscribe send an email to python-committers-leave@python.org https://mail.python.org/mailman3/lists/python-committers.python.org/ Message archived at https://mail.python.org/archives/list/python-committers@python.org/message/2... Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Inada Naoki <songofacandy@gmail.com>
Hi Victor,
I say no. Why the sudden urgency?
I think that you are making too many assumptions about how inter-interpreter communication is going to work, and about sharing of objects.
On 06/05/2020 6:49 pm, Victor Stinner wrote:
Hi,
tl; dr I'm asking for your permission to merge the following PR :-)
https://github.com/python/cpython/pull/19958
In bpo-40514, I added a new --with-experimental-isolated-subinterpreters configuration option. I chose to use a very long option name and to not document it on purpose: prevent users to use it "by mistake", without understanding its purpose. The option is related to "per-interpreter GIL":
https://bugs.python.org/issue40512
It's a practical solution to be able to experiment quickly per-interpreter GIL without having to fix all issues at once. For example, it disables Unicode interned strings which is unsafe with multiple interpreters running in parallel.
If the interning is done on a per-interpreter basis, then it is safe.
I added this #ifdef to encourage other core developers work on this project, and let early adopters test this experimental feature to give us their feedback.
You say this is to let other core developers work on this, but has anyone actually asked for these changes?
I hope that soon we will discover all places which need to be fixed, so it will help to better estimate how much work is needed to finish the implementation of per-interpreter GIL.
Currently, the special build changes:
- Per-interpreter GIL
- Store the current Python thread state in a TLS
- Disable dict, frame, tuple and list free list
- Disable type method cache
- Disable pymalloc: force usage of libc malloc
- Disable the GC in subinterpreters
- _xxsubinterpreters.run_string() releases the GIL
These changes are going to have such a large impact on robustness and performance as to make any comparisons meaningless.
I consider that's a reasonable small number of changes to get a cool feature (per-interpreter GIL), compared to the same of the CPython code base (637K lines).
I'm not concerned about the size of the change. What concerns me is that it is spread all over the code and introduces assumptions that are not made explicit.
--
All these "#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" are temporary workarounds until a proper fix is designed. For example, some caches should be made "per-interpreter".
EXPERIMENTAL_ISOLATED_SUBINTERPRETERS seems to have already appeared in many places. AFIACT, the changes adding it represent a change of over 250 lines without any review.
Most changes are easy to write, but some other changes are non trivial. For example, I modified _PyThreadState_GET() and _PyThreadState_Swap() to use a Thread Local Storage (TLS) to get and set the current Python thread state.
Does that mean you are going to remove all the PyThreadState *tstate
parameters that have been added lately?
Currently, "#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" are not "visible" to users: it is only used in .c files and a few internal header files. But the following PR modify the public Include/object.h header file to make PyObject.ob_refcnt atomic:
https://github.com/python/cpython/pull/19958
I would like to ensure that you are ok to put a few more temporary "#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" in CPython to speedup the development of subinterpreters running in parallel (per-interpreter GIL), or if you consider that it has gone too far and a Git fork would be better.
A fork would be much better.
Rather than sprinking #ifdefs everywhere, could you continue consolidating all "global" objects into a single data structure? Once that has been done, experimentation becomes much easier and the changes more localized.
Cheers, Mark.
Victor
Hi Mark,
Le jeu. 7 mai 2020 à 15:53, Mark Shannon <mark@hotpy.org> a écrit :
I say no. Why the sudden urgency?
My urgency was to be able to quickly see if per-interpreter GIL would be doable for Python 3.9 or not. The answer is no: there is too much work to be done to write a "correct" implementation. I'm talking about fixing subinterpreters issues in the proper way, not the idea of a special build.
I think that you are making too many assumptions about how inter-interpreter communication is going to work, and about sharing of objects.
I'm not sure what you mean here. My changes rely on the assumption that objects are not shared between two interpreters.
I didn't work at all on the inter-interpreter communication.
On 06/05/2020 6:49 pm, Victor Stinner wrote:
It's a practical solution to be able to experiment quickly per-interpreter GIL without having to fix all issues at once. For example, it disables Unicode interned strings which is unsafe with multiple interpreters running in parallel.
If the interning is done on a per-interpreter basis, then it is safe.
Sure. My idea was to quickly lists area of the code that should be reworked to be "compatible" with subinterpreters. As I wrote, the long term plan is to fix these issues. For example, I made small integer singletons per interpreter. The idea would be the same for interned strings. It's not hard to do it, but I didn't want to make this change right now, we are now close to the 3.9 release.
I think that I will remove my changes from the future 3.9 branch and only keep them in the master branch, to clarify that 3.9 is now out of the scope.
I added this #ifdef to encourage other core developers work on this project, and let early adopters test this experimental feature to give us their feedback.
You say this is to let other core developers work on this, but has anyone actually asked for these changes?
I do want these changes as Eric Snow. And another core dev also asked me how to contribute to this project.
Most of the past changes cleaned up Python internals. For example, properly release resources at exit. It fix old issues about Py_Initialize()/Py_Finalize() called multiple times when Python is embedded. It's not only about subinterpreters. See for example: https://bugs.python.org/issue1635741
Currently, the special build changes:
- Per-interpreter GIL
- Store the current Python thread state in a TLS
- Disable dict, frame, tuple and list free list
- Disable type method cache
- Disable pymalloc: force usage of libc malloc
- Disable the GC in subinterpreters
- _xxsubinterpreters.run_string() releases the GIL
These changes are going to have such a large impact on robustness and performance as to make any comparisons meaningless.
Oh sure, my benchmark on per-interpreter GIL was run on the same Python binary where all these caches were disabled.
I'm aware that "regular" Python has all these caches. But I didn't care of the absolute timing. I only wanted to check if subinterpreters actually "scales" with the number of CPUs. My PoC benchmark says that yes, it does. CPU-bound workaround is faster in subinterpreters than using threads. It also shows that subinterpeters have basically the same speed than multiprocessing, which is an interesting data point.
As I wrote in my email, they are only temporary changes using #ifdef, but I plan to fix all these issues (make the code compatible with subinterpreters).
Most changes are easy to write, but some other changes are non trivial. For example, I modified _PyThreadState_GET() and _PyThreadState_Swap() to use a Thread Local Storage (TLS) to get and set the current Python thread state.
Does that mean you are going to remove all the
PyThreadState *tstate
parameters that have been added lately?
I don't plan to make tstate implicit again soon. I like to see where Python states are coming from in functions. But it's an open question.
Rather than sprinking #ifdefs everywhere, could you continue consolidating all "global" objects into a single data structure?
Most of my work in 3.9 was to move things from _PyRuntimeState to PyInterpreterState.
Moving global variables into these structures is the simple solution, but I am trying to find a way to avoid declaring all structures in PyInterpreterState.
#include "pycore_interp.h" includes conditional variables which includes <windows.h> on Windows. It also includes tons of things, since PyInterpreterState became quite large.
Maybe there is a way to have "per-interpreter" variables, something similar to thread local storage (TLS). But I'm not sure how the API would look alike.
Victor
Night gathers, and now my watch begins. It shall not end until my death.
participants (4)
-
Brett Cannon
-
Inada Naoki
-
Mark Shannon
-
Victor Stinner