[Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures

Wed Sep 19 03:31:00 CEST 2007

Thinking about how Python can better support parallelism and
concurrency is an important topic.  Here is how I see it:  if we don't
address the issue, the Python interpreter 5 or 10 years from now will
run at roughly the same speed as it does today.  This is because
single CPU cores are not getting much faster (power consumption is too
high).  Instead, most of the performance gains in hardware will be due
to increased hardware parallelism, which means multi/many core CPUs.

What to do about this pending crisis is a complicated issue.

There are (at least) two levels that are important:

1.  Language level features that make it possible to build
higher-level libraries/tools for parallelism.

2.  The high-level libraries/tools that most users and developers
would use to express parallelism.

I think it is absolutely critical that we worry about (1) before
jumping to (2).  So, some thoughts about (1).  Does Python itself need
to be changed to better enable people to write libraries for
expressing parallelism?

My answer to this is no.  The dominant languages for parallel
computing (C/C++/Fortran) don't really have any additional constructs
or features above Python in this respect.  Java has a more
sophisticated support for threads.  Erlang has concurrency built into
its core.  But, Python is not Erlang or Java.  As Twisted
demonstrates, Python as a language is plenty powerful enough to
express concurrency in an elegant way.  I am not saying that
parallelism and concurrency is easy or wonderful today in Python, just
that the language itself is not the problem.  We don't necessarily
need new language features, we simply need bright people to sit down
and think about the right way to express parallelism in Python and
then write libraries (maybe in the stdlib) that implement those ideas.

But, there is a critical problem in CPython's implementation that
prevents people from really breaking new ground in this area with
Python.  It is the GIL and here is why:

* For the platforms on which Python runs, threads are what the
hardware+OS people have given to us as the most fine grained way of
mapping parallelism onto hardware.  This is true, even if you have
philosophical or existential problems with threads.  With the
limitations of the GIL, we can't take advantage of what hardware gives
to us.

* A process based solution using message passing is simply not
suitable for many parallel algorithms that are communications bound.
The shared state of threads is needed in many cases, not because
sharing state is a "fantastic idea", but rather because it is fast.
This will only become more true as multicore CPUs gain more
sophisticated memory architectures with higher bandwidths.  Also, the
overhead of managing processes is much greater than with threads.
Many exellent fine grained parallel approaches like Cilk would not be
possible with processes only.

* There are a number of powerful, high-level Python packages that
already exist (these have been named in the various threads) that
allow parallelism to be expressed.  All of these suffer from a GIL
related problem even though they are process based and use message
passing.  Regardless of whether you are using blocking/non-blocking
sockets/IPC, you can't run long running CPU bound code, because all
the network related stuff will stop.  You then think, "OK, I will run
the CPU intensive stuff in a different thread."  If the CPU intensive
code is just regular Python, you are fine, the Python interpreter will
switch between the network thread and the CPU intensive thread every
so often.  But the second you run extension code that doesn't release
the GIL, you are screwed.  The network thread will die until the
extension code is done.  When it comes to implementing robust process
based parallelism using sockets, the last thing you can afford is to
have your networking black out like this, and in CPython it can't be
avoided.

<disclaimer>
I am not saying that threads are what everyone should be using to
express parallelism.  I am only saying that they are needed to
implement robust higher-level forms of parallelism on multicore
systems, regardless of whether the solution is using process+ threads
or threads alone.
</disclaimer>

Of the dozen or so "parallel Python" packages that currently exist,
they _all_ suffer from this problem (some hide it better than others
though using clever tricks).  We can run but we can't hide.

Because of these things, I think the current "Exploratory PEP" is
entirely premature.  Let's figure out exactly what to do with the GIL
and _then_ think about the fun stuff.

Brian