Hi Mark,
Le jeu. 7 mai 2020 à 15:53, Mark Shannon <mark@hotpy.org> a écrit :
I say no. Why the sudden urgency?
My urgency was to be able to quickly see if per-interpreter GIL would be doable for Python 3.9 or not. The answer is no: there is too much work to be done to write a "correct" implementation. I'm talking about fixing subinterpreters issues in the proper way, not the idea of a special build.
I think that you are making too many assumptions about how inter-interpreter communication is going to work, and about sharing of objects.
I'm not sure what you mean here. My changes rely on the assumption that objects are not shared between two interpreters.
I didn't work at all on the inter-interpreter communication.
On 06/05/2020 6:49 pm, Victor Stinner wrote:
It's a practical solution to be able to experiment quickly per-interpreter GIL without having to fix all issues at once. For example, it disables Unicode interned strings which is unsafe with multiple interpreters running in parallel.
If the interning is done on a per-interpreter basis, then it is safe.
Sure. My idea was to quickly lists area of the code that should be reworked to be "compatible" with subinterpreters. As I wrote, the long term plan is to fix these issues. For example, I made small integer singletons per interpreter. The idea would be the same for interned strings. It's not hard to do it, but I didn't want to make this change right now, we are now close to the 3.9 release.
I think that I will remove my changes from the future 3.9 branch and only keep them in the master branch, to clarify that 3.9 is now out of the scope.
I added this #ifdef to encourage other core developers work on this project, and let early adopters test this experimental feature to give us their feedback.
You say this is to let other core developers work on this, but has anyone actually asked for these changes?
I do want these changes as Eric Snow. And another core dev also asked me how to contribute to this project.
Most of the past changes cleaned up Python internals. For example, properly release resources at exit. It fix old issues about Py_Initialize()/Py_Finalize() called multiple times when Python is embedded. It's not only about subinterpreters. See for example: https://bugs.python.org/issue1635741
Currently, the special build changes:
- Per-interpreter GIL
- Store the current Python thread state in a TLS
- Disable dict, frame, tuple and list free list
- Disable type method cache
- Disable pymalloc: force usage of libc malloc
- Disable the GC in subinterpreters
- _xxsubinterpreters.run_string() releases the GIL
These changes are going to have such a large impact on robustness and performance as to make any comparisons meaningless.
Oh sure, my benchmark on per-interpreter GIL was run on the same Python binary where all these caches were disabled.
I'm aware that "regular" Python has all these caches. But I didn't care of the absolute timing. I only wanted to check if subinterpreters actually "scales" with the number of CPUs. My PoC benchmark says that yes, it does. CPU-bound workaround is faster in subinterpreters than using threads. It also shows that subinterpeters have basically the same speed than multiprocessing, which is an interesting data point.
As I wrote in my email, they are only temporary changes using #ifdef, but I plan to fix all these issues (make the code compatible with subinterpreters).
Most changes are easy to write, but some other changes are non trivial. For example, I modified _PyThreadState_GET() and _PyThreadState_Swap() to use a Thread Local Storage (TLS) to get and set the current Python thread state.
Does that mean you are going to remove all the
PyThreadState *tstate
parameters that have been added lately?
I don't plan to make tstate implicit again soon. I like to see where Python states are coming from in functions. But it's an open question.
Rather than sprinking #ifdefs everywhere, could you continue consolidating all "global" objects into a single data structure?
Most of my work in 3.9 was to move things from _PyRuntimeState to PyInterpreterState.
Moving global variables into these structures is the simple solution, but I am trying to find a way to avoid declaring all structures in PyInterpreterState.
#include "pycore_interp.h" includes conditional variables which includes <windows.h> on Windows. It also includes tons of things, since PyInterpreterState became quite large.
Maybe there is a way to have "per-interpreter" variables, something similar to thread local storage (TLS). But I'm not sure how the API would look alike.
Victor
Night gathers, and now my watch begins. It shall not end until my death.