
On 20/06/15 23:42, Eric Snow wrote:
tl;dr Let's exploit multiple cores by fixing up subinterpreters, exposing them in Python, and adding a mechanism to safely share objects between them.
This proposal is meant to be a shot over the bow, so to speak. I plan on putting together a more complete PEP some time in the future, with content that is more refined along with references to the appropriate online resources.
Feedback appreciated! Offers to help even more so! :)
From the perspective of software design, it would be good it the CPython interpreter provided an environment instead of using global objects. It would mean that all functions in the C API would need to take the environment pointer as their first variable, which will be a major rewrite. It would also allow the "one interpreter per thread" design similar to tcl and .NET application domains. However, from the perspective of multi-core parallel computing, I am not sure what this offers over using multiple processes. Yes, you avoid the process startup time, but on POSIX systems a fork is very fast. An certainly, forking is much more efficient than serializing Python objects. It then boils down to a workaround for the fact that Windows cannot fork, which makes it particularly bad for running CPython. You also have to start up a subinterpreter and a thread, which is not instantaneous. So I am not sure there is a lot to gain here over calling os.fork. A non-valid argument for this kind of design is that only code which uses threads for parallel computing is "real" multi-core code. So Python does not support multi-cores because multiprocessing or os.fork is just faking it. This is an argument that belongs in the intellectual junk yard. It stems from the abuse of threads among Windows and Java developers, and is rooted in the absence of fork on Windows and the formerly slow fork on Solaris. And thus they are only able to think in terms of threads. If threading.Thread does not scale the way they want, they think multicores are out of reach. So the question is, how do you want to share objects between subinterpreters? And why is it better than IPC, when your idea is to isolate subinterpreters like application domains? If you think avoiding IPC is clever, you are wrong. IPC is very fast, in fact programs written to use MPI tends to perform and scale better than programs written to use OpenMP in parallel computing. Not only is IPC fast, but you also avoid an issue called "false sharing", which can be even more detrimental than the GIL: You have parallel code, but it seems to run in serial, even though there is no explicit serialization anywhere. And by since Murphy's law is working against us, Python reference counts will be false shared unless we use multiple processes. The reason IPC in multiprocessing is slow is due to calling pickle, it is not the IPC in itself. A pipe or an Unix socket (named pipe on Windows) have the overhead of a memcpy in the kernel, which is equal to a memcpy plus some tiny constant overhead. And if you need two processes to share memory, there is something called shared memory. Thus, we can send data between processes just as fast as between subinterpreters. All in all, I think we are better off finding a better way to share Python objects between processes. P.S. Another thing to note is that with sub-interpreters, you can forget about using ctypes or anything else that uses the simplified GIL API (e.g. certain Cython generated extensions). Sturla