<div dir="ltr"><div>On Thu, Sep 7, 2017 at 5:15 PM Nathaniel Smith <<a href="mailto:njs@pobox.com">njs@pobox.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Thu, Sep 7, 2017 at 4:23 PM, Nick Coghlan <<a href="mailto:ncoghlan@gmail.com" target="_blank">ncoghlan@gmail.com</a>> wrote:<br>> The gist of the idea is that with subinterpreters, your starting point<br>

> is multiprocessing-style isolation (i.e. you have to use pickle to<br>

> transfer data between subinterpreters), but you're actually running in<br>

> a shared-memory threading context from the operating system's<br>

> perspective, so you don't need to rely on mmap to share memory over a<br>

> non-streaming interface.<br>

<br>

The challenge is that streaming bytes between processes is actually<br>

really fast -- you don't really need mmap for that. (Maybe this was<br>

important for X11 back in the 1980s, but a lot has changed since then<br>

:-).) And if you want to use pickle and multiprocessing to send, say,<br>

a single big numpy array between processes, that's also really fast,<br>

because it's basically just a few memcpy's. The slow case is passing<br>

complicated objects between processes, and it's slow because pickle<br>

has to walk the object graph to serialize it, and walking the object<br>

graph is slow. Copying object graphs between subinterpreters has the<br>

same problem.<br></blockquote><div><br></div><div>This doesn't match up with my (somewhat limited) experience. For example, in this table of bandwidth estimates from Matthew Rocklin (CCed), IPC is about 10x slower than a memory copy:</div><a href="http://matthewrocklin.com/blog/work/2015/12/29/data-bandwidth">http://matthewrocklin.com/blog/work/2015/12/29/data-bandwidth</a><br><br>This makes a considerable difference when building a system do to parallel data analytics in Python (e.g., on NumPy arrays), which is exactly what Matthew has been working on for the past few years.</div><div class="gmail_quote"><br></div><div class="gmail_quote">I'm sure there are other ways to avoid this expensive IPC without using sub-interpreters, e.g., by using a tool like Plasma (<a href="http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/">http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/</a>). But I'm skeptical of your assessment that the current multiprocessing approach is fast enough.</div></div>