[Python-ideas] Python and Concurrency

March 22, 2007

      I would like to direct your attention to a presentation that was given 
by Mr. Herb Sutter on the subject of concurrency support in programming 
languages. It's a long video, but I would urge you to watch at least the 
first five minutes of it:

http://video.google.com/videoplay?docid=7625918717318948700&q=herb+sutter

Mr. Sutter is the secretary of the ISO/C++ standards committee. I 
mention this only so that you'll understand he's not just some random 
yo-yo with an opinion.

(There are a number of other papers by Mr. Sutter on the same topic 
which you can google for, however none of them are as compelling or 
comprehensive as the video presentation IMHO.)

The gist of the introduction is this (my words, not his): Starting from 
now, and for the rest of your career as a programmer, you better start 
programming for massively concurrent systems or you are doomed to 
irrelevance.

As he points out in the talk, the good news is that Moore's law is going 
to continue for several decades to come. The bad news is that most 
people don't understand Moore's law or what they get from it. Moore's 
law is really about the number of transistors on a chip. It says nothing 
about processor speed or performance.

He calls it "the end of the free lunch", meaning that in the past you 
could write a program, wait a little while, and it would run faster - 
because the hardware got faster while you were waiting. This will no 
longer be true for single-threaded programs. Scalar processors, with all 
of their pipelining, prefetching and other tricks, have gotten about as 
fast as they are ever going to get; We've reached the point of 
diminishing returns.

So the only thing left to do is throw more and more cores on the chip. 
There's no physical reason why the number of cores won't continue to 
double every 18 months; the only reason why manufacturers might choose 
not to make 128-core CPUs by the year 2012 is that there won't be enough 
software to take advantages of that degree of parallelism. (Even with 
today's transistor count, you could put 100 Pentium-class processors on 
a single die.)

Moreover, programming languages which have built-in support for scaling 
to many processors are going to win in the long run, and programming 
languages that don't have strong, built-in concurrency support will 
gradually fade away. The reason for this is simple: People don't like 
using applications which don't get faster when they buy a better computer.

Now, on the server side, all of this is a solved problem. We have a very 
robust model for handling hundreds of requests in parallel, and a very 
strong model for dealing with shared state. This model is called 
"transactions" or more formally ACID.

On the client side, however, things are much worse. As he puts it, locks 
are the best we have, and locks suck. Locks are not "logically 
composable", in the sense that you can't simply take two pieces of 
software that were written independently and glue them together unless 
they share a common locking architecture. This ruins a large part of the 
power of modern software, which is the ability to compose software out 
of smaller, independently-written components.

On the client side, you have heterogeneous components accessing shared 
state at high bandwidth with myriads of pointer-based references to that 
shared state. Add concurrency, and what you end up with is a system in 
which it is impossible to make logical inferences about the behavior of 
the system.

(And before you ask - yes, functional languages - and their drawbacks - 
are addressed in the talk. The same is true for 'lock-free programming' 
and other techniques du jour.)

Now, in the talk he proposes some strategies for building concurrency 
into the language so that client-side programming can be done in a way 
that isn't rocket science. We won't be able to get rid of locks, he 
claims, but with the right language support at least we'll be able to 
reason about them, and maybe push them off to the side so that they 
mostly stay in their corner.

In any case, what I would like to open up is a discussion of how these 
ideas might influence the design of the Python language.

One thing that is important to understand is that I'm not talking about 
"automatic parallelization" where the compiler automatically figures out 
what parts can be parallelized. That would require so much change to the 
Python language as to make it no longer Python.

Nor am I talking about manually creating "thread" objects, and having to 
carefully place protections around any shared state. That kind of 
parallelism is too low-level, too coarse-grained, and too hard to do, 
even for people who think they know how to write concurrent code.

I am not even necessarily talking about changing the Python language 
(although certainly the internals of the interpreter will need to be 
changed.) The same language can be used to describe the same kinds of 
problems and operations, but the implications of those language elements 
will change. This is analogous to the fact that these massively 
multicore CPUs 10 years from now will most likely be using the same 
instruction sets as today - but that does not mean that programming as a 
discipline will anything like what it is now.

As an example of what I mean, suppose the Python 'with' statement could 
be used to indicate an atomic transaction. Any operations that were done 
within that block would either be committed or rolled back. This is not 
a new idea - here's an interesting paper on it:

    http://www-sal.cs.uiuc.edu/~zilles/tm.html

I'm sure that there's a lot more like that out there. However, there is 
also a lot of stuff out there that is *superficially* similar to what I 
am talking about, and I want to make the distinction clear. For example, 
any discussion of concurrency in Python will naturally raise the topic 
of both IPython and Stackless. However, IPython (from what I understand) 
is really about distributed computing and not so much about fine-grained 
concurrency; And Stackless (from what I understand) is really about 
coroutines or continuations, which is a different kind of concurrency. 
Unless I am mistaken (and I very well could be) neither of these are 
immediately applicable to the problem of authoring Python programs for 
multi-core CPUs, but I think that both of them contain valuable ideas 
that are worth discussing.

Now, I will be the first to admit that I am not an expert in these 
matters. Don't let my poor, naive ideas be a limit to what is discussed. 
What I've primarily tried to do in this posting is to get your attention 
and convince you that this is important and worth talking about. And I 
hope to be able to learn something along the way.

My overall goal here is to be able to continue writing programs in 
Python 10 years from now, not just as a hobby, but as part of my 
professional work. If Python is able to leverage the power of the CPUs 
that are being created at that time, I will be able to make a strong 
case for doing so. On the other hand, if I have a 128-core CPU on my 
desk, and Python is only able to utilize 1/128th of that power without 
resorting to tedious calculations of race conditions and deadlocks, then 
its likely that my Python programming will be relegated to the role of a 
hobby.

-- Talin