On Fri, Oct 8, 2021 at 1:51 PM Sam Gross <colesbury@gmail.com> wrote:
Hi,
I've been working on changes to CPython to allow it to run without the global interpreter lock. I'd like to share a working proof-of-concept that can run without the GIL. The proof-of-concept involves substantial changes to CPython internals, but relatively few changes to the C-API. It is compatible with many C extensions: extensions must be rebuilt, but usually require small or no modifications to source code. I've built compatible versions of packages from the scientific Python ecosystem, and they are installable through the bundled "pip".
Source code: https://github.com/colesbury/nogil
Design overview: https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd...
My goal with the proof-of-concept is to demonstrate that removing the GIL is feasible and worthwhile, and that the technical ideas of the project could serve as a basis of such an effort.
Thanks for doing this, and thanks for the detailed writeup! I'd like to offer a perspective from observing the ongoing project of a brother of mine; he does not have the concurrency experience that I have, and it's been instructive to see what he has trouble with. For reference, the project involves GTK (which only works on the main thread), multiple threads for I/O (eg a socket read/parse/process thread), and one thread managed by asyncio using async/await functions. At no point has he ever had a problem with performance, because the project is heavily I/O based, spending most of its time waiting for events. So this is not going to touch on the question of single-threaded vs multi-threaded performance. To him, an async function and a thread function are exactly equivalent. He doesn't think in terms of yield points or anything; they are simply two ways of doing parallelism and are, to his code, equivalent. Mutable shared state is something to get your head around with *any* sort of parallelism, and nothing will change that. Whether it's asyncio, GUI callbacks, or actual threads, the considerations have been exactly the same. Threads neither gain nor lose compared to other options. Not being a low-level programmer, he has, I believe, an inherent assumption that any operation on a built-in type will be atomic. He's never stated this but I suspect he's assuming that. It's an assumption that Python is never going to violate. Concurrency is *hard*. There's no getting around it, there's no sugar-coating it. There are concepts that simply have to be learned, and the failures can be extremely hard to track down. Instantiating an object on the wrong thread can crash GTK, but maybe not immediately. Failing to sleep in one thread results in other threads stalling. I don't think any of this is changed by different modes (with the exception of process-based parallelism, which fixes a lot of concurrency at the cost of explicit IPC), and the more work programmers want their code to do, the more likely that they'll run into this. Glib.idle_add is really just a magic incantation to make the GUI work. :) Spawning a thread for asyncio isn't too hard as long as you don't have to support older Python versions.... sadly, not every device updated at the same time. But in a few years, we will be able to ignore Python versions pre-3.7. Most likely, none of his code would be affected by the removal of the GIL, since (as I understand it) the guarantees as seen in Python code won't change. Will there be impact on lower-memory systems? As small devices go, the Raspberry Pi is one of the largest, but it's still a lot smaller than a full PC, and adding overhead to every object would be costly (I'm not sure what the cost of local reference counting is, but it can't be none). Threading is perfectly acceptable for a project like this, so I'm hoping that GIL removal won't unnecessarily penalize this kind of thread usage. Speaking of local refcounting, how does that affect objects that get passed as thread arguments? Initially, an object is owned by the creating thread, which can relinquish ownership if its local refcount drops to zero; does another thread then take it over? I'm excited by anything that helps parallelism in Python, and very curious to see where this effort will go. If you need a hand with testing, I'd be happy to help out. ChrisA