On 21 June 2015 at 21:41, Sturla Molden <sturla.molden@gmail.com> wrote:
On 20/06/15 23:42, Eric Snow wrote:
tl;dr Let's exploit multiple cores by fixing up subinterpreters, exposing them in Python, and adding a mechanism to safely share objects between them.
This proposal is meant to be a shot over the bow, so to speak. I plan on putting together a more complete PEP some time in the future, with content that is more refined along with references to the appropriate online resources.
Feedback appreciated! Offers to help even more so! :)
From the perspective of software design, it would be good it the CPython interpreter provided an environment instead of using global objects. It would mean that all functions in the C API would need to take the environment pointer as their first variable, which will be a major rewrite. It would also allow the "one interpreter per thread" design similar to tcl and .NET application domains.
However, from the perspective of multi-core parallel computing, I am not sure what this offers over using multiple processes.
Yes, you avoid the process startup time, but on POSIX systems a fork is very fast. An certainly, forking is much more efficient than serializing Python objects. It then boils down to a workaround for the fact that Windows cannot fork, which makes it particularly bad for running CPython. You also have to start up a subinterpreter and a thread, which is not instantaneous. So I am not sure there is a lot to gain here over calling os.fork.
Please give Eric and I the courtesy of assuming we know how CPython works. This article, which is an update of a Python 3 Q&A answer I wrote some time ago, goes into more detail on the background of this proposed investigation: http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python...
A non-valid argument for this kind of design is that only code which uses threads for parallel computing is "real" multi-core code. So Python does not support multi-cores because multiprocessing or os.fork is just faking it. This is an argument that belongs in the intellectual junk yard. It stems from the abuse of threads among Windows and Java developers, and is rooted in the absence of fork on Windows and the formerly slow fork on Solaris. And thus they are only able to think in terms of threads. If threading.Thread does not scale the way they want, they think multicores are out of reach.
Sturla, expressing out and out contempt for entire communities of capable, competent developers (both the creators of Windows and Java, and the users of those platforms) has no place on the core Python mailing lists. Please refrain from casually insulting entire groups of people merely because you don't approve of their technical choices.
The reason IPC in multiprocessing is slow is due to calling pickle, it is not the IPC in itself. A pipe or an Unix socket (named pipe on Windows) have the overhead of a memcpy in the kernel, which is equal to a memcpy plus some tiny constant overhead. And if you need two processes to share memory, there is something called shared memory. Thus, we can send data between processes just as fast as between subinterpreters.
Avoiding object serialisation is indeed the main objective. With subinterpreters, we have a lot more options for that than we do with any form of IPC, including shared references to immutable objects, and the PEP 3118 buffer API.
All in all, I think we are better off finding a better way to share Python objects between processes.
This is not an either/or question, as other folks remain free to work on improving multiprocessing's IPC efficiency if they want to. We don't seem to have folks clamouring at the door to work on that, though.
P.S. Another thing to note is that with sub-interpreters, you can forget about using ctypes or anything else that uses the simplified GIL API (e.g. certain Cython generated extensions).
Those aren't fundamental conceptual limitations, they're incidental limitations of the current design and implementation of the simplified GIL state API. One of the benefits of introducing a Python level API for subinterpreters is that it makes it easier to start testing, and hence fixing, some of those limitations (I actually just suggested to Eric off list that adding subinterpreter controls to _testcapi might be a good place to start, as that's beneficial regardless of what, if anything, ends up happening from a public API perspective) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia