On 10 May 2021, at 20:23, Barry Scott firstname.lastname@example.org wrote:
On 10 May 2021, at 15:30, Sophist email@example.com wrote:
I don't know how many people will remember some work that David Beazley did about a decade ago on how the GIL impacts multithreading performance - essentially he instrumented the Python interpreter to log how multiple threads competed for the GIL and gave several presentations over the space of 2-3 years. A couple of years ago I reached out to him with an idea on how to significantly improve the way that Python handles multi-threading hand-off of the GIL, but unfortunately he was not interested in pursuing this further. I am raising it here in the hope that someone else would be interested in implementing this.
In essence my idea is to stop Python handing off the GIL through a competition between threads that are ready to run, and instead for Python to implement a scheduler for the GIL which decides which thread should get the GIL next and directly hands it over.
Here are the links to David Beazley's presentations:
2009: Inside the Python GIL - https://www.youtube.com/watch?v=ph374fJqFPE 2010: Understanding the Python GIL - https://speakerdeck.com/dabeaz/understanding-the-python-gil https://www.youtube.com/watch?v=Obt-vMVdM8s 2011: Embracing the Global Interpreter Lock - https://speakerdeck.com/dabeaz/embracing-the-global-interpreter-lock https://www.youtube.com/watch?v=fwzPF2JLoeU 2011: In Search of the Perfect Global Interpreter Lock - https://speakerdeck.com/dabeaz/in-search-of-the-perfect-global-interpreter-l... https://www.youtube.com/watch?v=5jbG7UKT1l4
Given this is very old information I think the first thing needed is to reproduce David's experiments and see if the 3.10 implementation has the same issues.
Have you done this already?
If you turn these slides into benchmark code that would make it easier to experiment with.
Benchmarks will need running on macOS, Windows, Linux at least.
It looks like the GIL code has not changed in a long time.
But for 3.7 FORCE_SWITCHING is always defined that changes the GIL behaviour.
This comment in Python/ceval_gil.h explains what that does:
- When a thread releases the GIL and gil_drop_request is set, that thread ensures that another GIL-awaiting thread gets scheduled. It does so by waiting on a condition variable (switch_cond) until the value of last_holder is changed to something else than its own thread state pointer, indicating that another thread was able to take the GIL.
This is meant to prohibit the latency-adverse behaviour on multi-core machines where one thread would speculatively release the GIL, but still run and end up being the first to re-acquire it, making the "timeslices" much longer than expected. (Note: this mechanism is enabled with FORCE_SWITCHING above)