[Python-Dev] GIL removal question
Sturla Molden
sturla at molden.no
Fri Aug 12 20:36:37 CEST 2011
Den 12.08.2011 18:57, skrev Rene Nejsum:
> My two danish kroner on GIL issues….
>
> I think I understand the background and need for GIL. Without it
> Python programs would have been cluttered with lock/synchronized
> statements and C-extensions would be harder to write. Thanks to Sturla
> Molden for he's explanation earlier in this thread.
I doesn't seem I managed to explain it :(
Yes, C extensions would be cluttered with synchronization statements,
and that is annoying. But that was not my point all!
Even with fine-grained locking in place, a system using reference
counting will not scale on an multi-processor computer. Cache-lines
containing reference counts will become incoherent between the
processors, causing traffic jam on the memory bus.
The technical term in parallel computing litterature is "false sharing".
> However, the GIL is also from a time, where single threaded programs
> running in single core CPU's was the common case.
>
> On a new MacBook Pro I have 8 core's and would expect my multithreaded
> Python program to run significantly fast than on a one-core CPU.
>
> Instead the program slows down to a much worse performance than on a
> one-core CPU.
A multi-threaded program can be slower on a multi-processor computer as
well, if it suffered from extensive "false sharing" (which Python
programs nearly always will do).
That is, instead of doing useful work, the processors are stepping on
each others toes. So they spend the bulk of the time synchronizing cache
lines with RAM instead of computing.
On a computer with a single processor, there cannot be any false
sharing. So even without a GIL, a multi-threaded program can often run
faster on a single-processor computer. That might seem counter-intuitive
at first. I seen this "inversed scaling" blamed on the GIL many times,
but it's dead wrong.
Multi-threading is hard to get right, because the programmer must ensure
that processors don't access the same cache lines. This is one of the
reasons why numerical programs based on MPI (multiple processes and IPC)
are likely to perform better than numerical programs based on OpenMP
(multiple threads and shared memory).
As for Python, it means that it is easier to make a program based on
multiprocessing scale well on a multi-processor computer, than a program
based on threading and releasing the GIL. And that has nothing to do
with the GIL! Albeit, I'd estimate 99% of Python programmers would blame
it on the GIL. It has to do with what shared memory does if cache lines
are shared. Intuition about what affects the performance of a
multi-threaded program is very often wrong. If one needs parallel
computing, multiple processes is much more likely to scale correctly.
Threads are better reserved for things like non-blocking I/O.
The problem with the GIL is merely what people think it does -- not what
it actually does. It is so easy to blame a performance issue on the GIL,
when it is actually the use of threads and shared memory per se that is
the problem.
Sturla
More information about the Python-Dev
mailing list