Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

Sept. 8, 2017

      On Thu, Sep 7, 2017 at 4:23 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
...
On 7 September 2017 at 15:48, Nathaniel Smith <njs@pobox.com> wrote:
...
I've actually argued with the PyPy devs to try to convince them to add
subinterpreter support as part of their experiments with GIL-removal,
because I think the semantics would genuinely be nicer to work with
than raw threads, but they're convinced that it's impossible to make
this work. Or more precisely, they think you could make it work in
theory, but that it would be impossible to make it meaningfully more
efficient than using multiple processes. I want them to be wrong, but
I have to admit I can't see a way to make it work either...
The gist of the idea is that with subinterpreters, your starting point
is multiprocessing-style isolation (i.e. you have to use pickle to
transfer data between subinterpreters), but you're actually running in
a shared-memory threading context from the operating system's
perspective, so you don't need to rely on mmap to share memory over a
non-streaming interface.
The challenge is that streaming bytes between processes is actually
really fast -- you don't really need mmap for that. (Maybe this was
important for X11 back in the 1980s, but a lot has changed since then
:-).) And if you want to use pickle and multiprocessing to send, say,
a single big numpy array between processes, that's also really fast,
because it's basically just a few memcpy's. The slow case is passing
complicated objects between processes, and it's slow because pickle
has to walk the object graph to serialize it, and walking the object
graph is slow. Copying object graphs between subinterpreters has the
same problem.

So the only case I can see where I'd expect subinterpreters to make
communication dramatically more efficient is if you have a "deeply
immutable" type: one where not only are its instances immutable, but
all objects reachable from those instances are also guaranteed to be
immutable. So like, a tuple except that when you instantiate it it
validates that all of its elements are also marked as deeply
immutable, and errors out if not. Then when you go to send this
between subinterpreters, you can tell by checking the type of the root
object that the whole graph is immutable, so you don't need to walk it
yourself.

However, it seems impossible to support user-defined deeply-immutable
types in Python: types and functions are themselves mutable and hold
tons of references to other potentially mutable objects via __mro__,
__globals__, __weakrefs__, etc. etc., so even if a user-defined
instance can be made logically immutable it's still going to hold
references to mutable things. So the one case where subinterpreters
win is if you have a really big and complicated set of nested
pseudo-tuples of ints and strings and you're bottlenecked on passing
it between interpreters. Maybe frozendicts too. Is that enough to
justify the whole endeavor? It seems dubious to me.

I guess the other case where subprocesses lose to "real" threads is
startup time on Windows. But starting a subinterpreter is also much
more expensive than starting a thread, once you take into account the
cost of loading the application's modules into the new interpreter. In
both cases you end up needing some kind of process/subinterpreter pool
or cache to amortize that cost.

Obviously I'm committing the cardinal sin of trying to guess about
performance based on theory instead of measurement, so maybe I'm
wrong. Or maybe there's some deviously clever trick I'm missing. I
hope so -- a really useful subinterpreter multi-core store would be
awesome.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org