
Thinking about how Python can better support parallelism and concurrency is an important topic. Here is how I see it: if we don't address the issue, the Python interpreter 5 or 10 years from now will run at roughly the same speed as it does today. This is because single CPU cores are not getting much faster (power consumption is too high). Instead, most of the performance gains in hardware will be due to increased hardware parallelism, which means multi/many core CPUs. What to do about this pending crisis is a complicated issue. There are (at least) two levels that are important: 1. Language level features that make it possible to build higher-level libraries/tools for parallelism. 2. The high-level libraries/tools that most users and developers would use to express parallelism. I think it is absolutely critical that we worry about (1) before jumping to (2). So, some thoughts about (1). Does Python itself need to be changed to better enable people to write libraries for expressing parallelism? My answer to this is no. The dominant languages for parallel computing (C/C++/Fortran) don't really have any additional constructs or features above Python in this respect. Java has a more sophisticated support for threads. Erlang has concurrency built into its core. But, Python is not Erlang or Java. As Twisted demonstrates, Python as a language is plenty powerful enough to express concurrency in an elegant way. I am not saying that parallelism and concurrency is easy or wonderful today in Python, just that the language itself is not the problem. We don't necessarily need new language features, we simply need bright people to sit down and think about the right way to express parallelism in Python and then write libraries (maybe in the stdlib) that implement those ideas. But, there is a critical problem in CPython's implementation that prevents people from really breaking new ground in this area with Python. It is the GIL and here is why: * For the platforms on which Python runs, threads are what the hardware+OS people have given to us as the most fine grained way of mapping parallelism onto hardware. This is true, even if you have philosophical or existential problems with threads. With the limitations of the GIL, we can't take advantage of what hardware gives to us. * A process based solution using message passing is simply not suitable for many parallel algorithms that are communications bound. The shared state of threads is needed in many cases, not because sharing state is a "fantastic idea", but rather because it is fast. This will only become more true as multicore CPUs gain more sophisticated memory architectures with higher bandwidths. Also, the overhead of managing processes is much greater than with threads. Many exellent fine grained parallel approaches like Cilk would not be possible with processes only. * There are a number of powerful, high-level Python packages that already exist (these have been named in the various threads) that allow parallelism to be expressed. All of these suffer from a GIL related problem even though they are process based and use message passing. Regardless of whether you are using blocking/non-blocking sockets/IPC, you can't run long running CPU bound code, because all the network related stuff will stop. You then think, "OK, I will run the CPU intensive stuff in a different thread." If the CPU intensive code is just regular Python, you are fine, the Python interpreter will switch between the network thread and the CPU intensive thread every so often. But the second you run extension code that doesn't release the GIL, you are screwed. The network thread will die until the extension code is done. When it comes to implementing robust process based parallelism using sockets, the last thing you can afford is to have your networking black out like this, and in CPython it can't be avoided. <disclaimer> I am not saying that threads are what everyone should be using to express parallelism. I am only saying that they are needed to implement robust higher-level forms of parallelism on multicore systems, regardless of whether the solution is using process+ threads or threads alone. </disclaimer> Of the dozen or so "parallel Python" packages that currently exist, they _all_ suffer from this problem (some hide it better than others though using clever tricks). We can run but we can't hide. Because of these things, I think the current "Exploratory PEP" is entirely premature. Let's figure out exactly what to do with the GIL and _then_ think about the fun stuff. Brian