[Python-ideas] solving multi-core Python

Sun Jun 21 13:41:30 CEST 2015

On 20/06/15 23:42, Eric Snow wrote:
> tl;dr Let's exploit multiple cores by fixing up subinterpreters,
> exposing them in Python, and adding a mechanism to safely share
> objects between them.
>
> This proposal is meant to be a shot over the bow, so to speak.  I plan
> on putting together a more complete PEP some time in the future, with
> content that is more refined along with references to the appropriate
> online resources.
>
> Feedback appreciated!  Offers to help even more so! :)

 From the perspective of software design, it would be good it the 
CPython interpreter provided an environment instead of using global 
objects. It would mean that all functions in the C API would need to 
take the environment pointer as their first variable, which will be a 
major rewrite. It would also allow the "one interpreter per thread" 
design similar to tcl and .NET application domains.

However, from the perspective of multi-core parallel computing, I am not 
sure what this offers over using multiple processes.

Yes, you avoid the process startup time, but on POSIX systems a fork is 
very fast. An certainly, forking is much more efficient than serializing 
Python objects. It then boils down to a workaround for the fact that 
Windows cannot fork, which makes it particularly bad for running 
CPython. You also have to start up a subinterpreter and a thread, which 
is not instantaneous. So I am not sure there is a lot to gain here over 
calling os.fork.

A non-valid argument for this kind of design is that only code which 
uses threads for parallel computing is "real" multi-core code. So Python 
does not support multi-cores because multiprocessing or os.fork is just 
faking it. This is an argument that belongs in the intellectual junk 
yard. It stems from the abuse of threads among Windows and Java 
developers, and is rooted in the absence of fork on Windows and the 
formerly slow fork on Solaris. And thus they are only able to think in 
terms of threads. If threading.Thread does not scale the way they want, 
they think multicores are out of reach.

So the question is, how do you want to share objects between 
subinterpreters? And why is it better than IPC, when your idea is to 
isolate subinterpreters like application domains?

If you think avoiding IPC is clever, you are wrong. IPC is very fast, in 
fact programs written to use MPI tends to perform and scale better than 
programs written to use OpenMP in parallel computing. Not only is IPC 
fast, but you also avoid an issue called "false sharing", which can be 
even more detrimental than the GIL: You have parallel code, but it seems 
to run in serial, even though there is no explicit serialization 
anywhere. And by since Murphy's law is working against us, Python 
reference counts will be false shared unless we use multiple processes.
The reason IPC in multiprocessing is slow is due to calling pickle, it 
is not the IPC in itself. A pipe or an Unix socket (named pipe on 
Windows) have the overhead of a memcpy in the kernel, which is equal to 
a memcpy plus some tiny constant overhead. And if you need two processes 
to share memory, there is something called shared memory. Thus, we can 
send data between processes just as fast as between subinterpreters.

All in all, I think we are better off finding a better way to share 
Python objects between processes.

P.S. Another thing to note is that with sub-interpreters, you can forget 
about using ctypes or anything else that uses the simplified GIL API 
(e.g. certain Cython generated extensions).

Sturla