[Python-Dev] GIL removal question

Fri Aug 12 20:36:37 CEST 2011

Den 12.08.2011 18:57, skrev Rene Nejsum:
> My two danish kroner on GIL issues….
>
> I think I understand the background and need for GIL. Without it 
> Python programs would have been cluttered with lock/synchronized 
> statements and C-extensions would be harder to write. Thanks to Sturla 
> Molden for he's explanation earlier in this thread.

I doesn't seem I managed to explain it :(

Yes, C extensions would be cluttered with synchronization statements, 
and that is annoying. But that was not my point all!

Even with fine-grained locking in place, a system using reference 
counting will not scale on an multi-processor computer. Cache-lines 
containing reference counts will become incoherent between the 
processors, causing traffic jam on the memory bus.

The technical term in parallel computing litterature is "false sharing".

> However, the GIL is also from a time, where single threaded programs 
> running in single core CPU's was the common case.
>
> On a new MacBook Pro I have 8 core's and would expect my multithreaded 
> Python program to run significantly fast than on a one-core CPU.
>
> Instead the program slows down to a much worse performance than on a 
> one-core CPU.

A multi-threaded program can be slower on a multi-processor computer as 
well, if it suffered from extensive "false sharing" (which Python 
programs nearly always will do).

That is, instead of doing useful work, the processors are stepping on 
each others toes. So they spend the bulk of the time synchronizing cache 
lines with RAM instead of computing.

On a computer with a single processor, there cannot be any false 
sharing. So even without a GIL, a multi-threaded program can often run 
faster on a single-processor computer. That might seem counter-intuitive 
at first. I seen this "inversed scaling" blamed on the GIL many times, 
but it's dead wrong.

Multi-threading is hard to get right, because the programmer must ensure 
that processors don't access the same cache lines. This is one of the 
reasons why numerical programs based on MPI (multiple processes and IPC) 
are likely to perform better than numerical programs based on OpenMP 
(multiple threads and shared memory).

As for Python, it means that it is easier to make a program based on 
multiprocessing scale well on a multi-processor computer, than a program 
based on threading and releasing the GIL. And that has nothing to do 
with the GIL! Albeit, I'd estimate 99% of Python programmers would blame 
it on the GIL. It has to do with what shared memory does if cache lines 
are shared. Intuition about what affects the performance of a 
multi-threaded program is very often wrong. If one needs parallel 
computing, multiple processes is much more likely to scale correctly. 
Threads are better reserved for things like non-blocking I/O.

The problem with the GIL is merely what people think it does -- not what 
it actually does. It is so easy to blame a performance issue on the GIL, 
when it is actually the use of threads and shared memory per se that is 
the problem.

Sturla