[Python-ideas] Remove GIL with CAS instructions?

Antoine Pitrou solipsis at pitrou.net
Wed Oct 21 00:55:04 CEST 2009


> The cooperative 
> multithreading intended by using checkintervals only works properly on 
> single-processor systems. Python threads only work properly on 
> single-processor computers.

I'm trying to work on that. I have an experimental rewrite of the GIL for POSIX
systems which improves latencies, while decreasing the overhead of the GIL
itself (because if checked every 100 opcodes, it can be dropped and re-taken
tens of thousands of times per second...). If people have interesting workloads
for py3k, the Mercurial branch can be found here:

(my measurements, by the way, are based on a small concurrency benchmark for
Python I recently wrote: 
http://svn.python.org/view/sandbox/trunk/ccbench/ )

> There are lock-free data structures of any conceivable sort. There are 
> multithreaded garbage collectors in Java and .NET, that don't need a 
> global lock. I've heard claims that removal of the GIL would require 
> Python to use a garbage collector instead of reference counts. That is 
> not true. Lock-free data structures and garbage collectors have all one 
> thing in common: they use a compare-and-exchange (CAS) instruction 
> present in most modern processors, e.g. CMPXCHG on x86. In fact, CAS is 
> used to implement OS mutexes (sleeplocks) and the less expensive 
> spinlocks. Without a CAS instruction for the platform, there would be no 
> GIL. All platforms that has a working GIL has a working CAS.

That does not mean that CAS (or atomic increment/decrement, for that matter) is
as fast as a regular (non-atomic) increment / decrement.
Besides, reference counting is not the only thing to worry about.
You have to:
- protect (somehow) the GC when collecting, so that things don't get modified
under its feet
- protect INCREFs and DECREFs
- protect all mutable types (dicts being probably the most performance-critical,
since they are used for namespaces and class/object attributes and methods)
- protect all static/global data in C modules which have them
- and probably other things I'm not thinking about

An interesting experiment would be to add all those protections without even
attempting to remove the GIL, and measure the overhead compared to a regular
Python. IMO, if the overhead is less than 10%, then it would probably be
acceptable to go ahead and try removing the GIL.

But someone needs to tackle this concretely for it to happen, because I guess
these ideas have already been suggested a couple of times...



More information about the Python-ideas mailing list