[Python-Dev] The endless GIL debate: why not remove thread support instead?

Fri Dec 12 02:13:13 CET 2008

Last month there was a discussion on Python-Dev regarding removal of
reference counting to remove the GIL. I hope you forgive me for continuing
the debate.

I think reference counting is a good feature. It prevents huge piles of
garbage from building up. It makes the interpreter run more smoothly. It
is not just important for games and multimedia applications, but also
servers under high load. Python does not pause to look for garbage like
Java or .NET. It only pauses to look for dead reference cycles. This can
be safely turned off temporarily; it can be turned off completely if you
do not create reference cycles. With Java and .NET, no garbage is ever
reclaimed except by the intermittent garbage collection. Python always
reclaims an object when the reference count drops to zero – whether the GC
is enabled or not. This makes Python programs well-behaved. For this
reason, I think removing reference counting is a genuinely bad idea. Even
if the GIL is evil, this remedy is even worse.

I am not a Python core developer; I am a research scientist who use Python
because Matlab is (or used to be) a bad programming language, albeit a
good computing environment. As most people who have worked with scientific
computing know, there are better paradigms for concurrency than threads.
In particular, there are message-passing systems like MPI and Erlang, and
there are autovectorizing compilers for OpenMP and Fortran 90/95. There
are special LAPACK, BLAS and FFT libraries for parallel computer
architectures. There are fork-join systems like cilk and
java.util.concurrent. Threads seem to be used only because mediocre
programmers don't know what else to use.

I genuinely think the use of threads should be discouraged. It leads to
code that are full of bugs and difficult to maintain - race conditions,
deadlocks, and livelocks are common pitfalls. Very few developers are
capable of implementing efficient load-balancing by hand. Multi-threaded
programs tend to scale badly because they are badly written. If the GIL
discourages the abuse of threads, it serves a purpose albeit being evil
like the Linux kernel's BKL.

Python could be better off doing what tcl does. Allow each process to
embed multiple interpreters; run each interpreter in its own thread.
Implement a fast message-passing system between the interpreters (e.g.
copy-on-write by making communicated objects immutable), and Python would
be closer to Erlang than Java.

I thus think the main offender is the thread and threading modules - not
the GIL. Without thread support in the interpreter, there would be no
threads. Without threads, there would be no need for a GIL. Both sources
of evil can be removed by just removing thread support from the Python
interpreter. In addition, it would make Python faster at executing linear
code. Just copy the concurrency model of Erlang instead of Java and get
rid of those nasty threads. In the meanwhile, I'll continue to experiment
with multiprocessing.

Removing reference counting to encourage the use of threads is like
shooting ourselves in the leg twice. That’s my two cents on this issue.

There is another issue to note as well: If you can endure a 200x loss of
efficacy by using Python instead of Fortran, scalability on dual or
quad-core processors may not be that important. Just move the bottlenecks
out of Python and you are much better off.

Regards,
Sturla Molden