Mailman 3 PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code - Python-ideas

PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

older
LOAD_NAME/LOAD_GLOBAL should be...

Eric Snow

7 Sep 2017 7 Sep '17

1:26 p.m.

Hi all, As part of the multi-core work I'm proposing the addition of the "interpreters" module to the stdlib. This will expose the existing subinterpreters C-API to Python code. I've purposefully kept the API simple. Please let me know what you think. -eric https://www.python.org/dev/peps/pep-0554/ https://github.com/python/peps/blob/master/pep-0554.rst https://github.com/python/cpython/pull/1748 https://github.com/python/cpython/pull/1802 https://github.com/ericsnowcurrently/cpython/tree/high-level-interpreters-mo... ********************************************** PEP: 554 Title: Multiple Interpreters in the Stdlib Author: Eric Snow <ericsnowcurrently@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2017-09-05 Python-Version: 3.7 Post-History: Abstract ======== This proposal introduces the stdlib "interpreters" module. It exposes the basic functionality of subinterpreters that exists in the C-API. Rationale ========= Running code in multiple interpreters provides a useful level of isolation within the same process. This can be leveraged in number of ways. Furthermore, subinterpreters provide a well-defined framework in which such isolation may extended. CPython has supported subinterpreters, with increasing levels of support, since version 1.5. While the feature has the potential to be a powerful tool, subinterpreters have suffered from neglect because they are not available directly from Python. Exposing the existing functionality in the stdlib will help reverse the situation. Proposal ======== The "interpreters" module will be added to the stdlib. It will provide a high-level interface to subinterpreters and wrap the low-level "_interpreters" module. The proposed API is inspired by the threading module. The module provides the following functions: enumerate(): Return a list of all existing interpreters. get_current(): Return the currently running interpreter. get_main(): Return the main interpreter. create(): Initialize a new Python interpreter and return it. The interpreter will be created in the current thread and will remain idle until something is run in it. The module also provides the following class: Interpreter(id): id: The interpreter's ID (read-only). is_running(): Return whether or not the interpreter is currently running. destroy(): Finalize and destroy the interpreter. run(code): Run the provided Python code in the interpreter, in the current OS thread. Supported code: source text. Copyright ========= This document has been placed in the public domain.

Show replies by date

Paul Moore

7 Sep 7 Sep

1:52 p.m.

On 7 September 2017 at 19:26, Eric Snow <ericsnowcurrently@gmail.com> wrote:

...

As part of the multi-core work I'm proposing the addition of the "interpreters" module to the stdlib. This will expose the existing subinterpreters C-API to Python code. I've purposefully kept the API simple. Please let me know what you think.

Looks good. I agree with the idea of keeping the interface simple in the first instance - we can easily add extra functionality later, but removing stuff (or worse still, finding that stuff we thought was OK but had missed corner cases of was broken) is much harder.

...

run(code):

Run the provided Python code in the interpreter, in the current OS thread. Supported code: source text.

The only quibble I have is that I'd prefer it if we had a run(callable, *args, **kwargs) method. Either instead of, or as well as, the run(string) one here. Is there any reason why passing a callable and args is unsafe, and/or difficult? Naively, I'd assume that interp.call('f(a)') would be precisely as safe as interp.call(f, a) Am I missing something? Name visibility or scoping issues come to mind as possible complications I'm not seeing. At the least, if we don't want a callable-and-args form yet, a note in the PEP explaining why it's been omitted would be worthwhile. Paul

Eric Snow

2:14 p.m.

On Thu, Sep 7, 2017 at 11:52 AM, Paul Moore <p.f.moore@gmail.com> wrote:

...

The only quibble I have is that I'd prefer it if we had a run(callable, *args, **kwargs) method. Either instead of, or as well as, the run(string) one here.

Is there any reason why passing a callable and args is unsafe, and/or difficult? Naively, I'd assume that

interp.call('f(a)')

would be precisely as safe as

interp.call(f, a)

The problem for now is with sharing objects between interpreters. The simplest safe approach currently is to restrict execution to source strings. Then there are no complications. Interpreter.call() makes sense but I'd like to wait until we get feel for how subinterpreters get used and until we address some of the issues with object passing. FWIW, here are what I see as the next steps for subinterpreters in the stdlib: 1. add a basic queue class for passing objects between interpreters * only support strings at first (though Nick pointed out we could fall back to pickle or marshal for unsupported objects) 2. implement CSP on top of subinterpreters 3. expand the queue's supported types 4. add something like Interpreter.call() I didn't include such a queue in this proposal because I wanted to keep it as focused as possible. I'll add a note to the PEP about this.

...

Am I missing something? Name visibility or scoping issues come to mind as possible complications I'm not seeing. At the least, if we don't want a callable-and-args form yet, a note in the PEP explaining why it's been omitted would be worthwhile.

I'll add a note to the PEP. Thanks for pointing this out. :) -eric

Paul Moore

2:44 p.m.

On 7 September 2017 at 20:14, Eric Snow <ericsnowcurrently@gmail.com> wrote:

...

On Thu, Sep 7, 2017 at 11:52 AM, Paul Moore <p.f.moore@gmail.com> wrote:

...
Is there any reason why passing a callable and args is unsafe, and/or difficult? Naively, I'd assume that

interp.call('f(a)')

would be precisely as safe as

interp.call(f, a)

The problem for now is with sharing objects between interpreters. The simplest safe approach currently is to restrict execution to source strings. Then there are no complications. Interpreter.call() makes sense but I'd like to wait until we get feel for how subinterpreters get used and until we address some of the issues with object passing.

Ah, OK. so if I create a new interpreter, none of the classes, functions, or objects defined in my calling code will exist within the target interpreter? That makes sense, but I'd missed that nuance from the description. Again, this is probably worth noting in the PEP. And for the record, based on that one fact, I'm perfectly OK with the initial API being string-only.

...

FWIW, here are what I see as the next steps for subinterpreters in the stdlib:

1. add a basic queue class for passing objects between interpreters * only support strings at first (though Nick pointed out we could fall back to pickle or marshal for unsupported objects) 2. implement CSP on top of subinterpreters 3. expand the queue's supported types 4. add something like Interpreter.call()

I didn't include such a queue in this proposal because I wanted to keep it as focused as possible. I'll add a note to the PEP about this.

This all sounds very reasonable. Thanks for the clarification. Paul

Eric Snow

4:05 p.m.

On Thu, Sep 7, 2017 at 12:44 PM, Paul Moore <p.f.moore@gmail.com> wrote:

...

Ah, OK. so if I create a new interpreter, none of the classes, functions, or objects defined in my calling code will exist within the target interpreter? That makes sense, but I'd missed that nuance from the description. Again, this is probably worth noting in the PEP.

I'll make sure the PEP is more clear about this.

...

And for the record, based on that one fact, I'm perfectly OK with the initial API being string-only.

Great! :) -eric

Eric Snow

10:13 p.m.

On Thu, Sep 7, 2017 at 12:44 PM, Paul Moore <p.f.moore@gmail.com> wrote:

...

On 7 September 2017 at 20:14, Eric Snow <ericsnowcurrently@gmail.com> wrote:

...
I didn't include such a queue in this proposal because I wanted to keep it as focused as possible. I'll add a note to the PEP about this.

This all sounds very reasonable. Thanks for the clarification.

Hmm. Now I'm starting to think some form of basic queue would be important enough to include in the PEP. I'll see if that feeling holds tomorrow. -eric

Sebastian Krause

3:14 p.m.

Eric Snow <ericsnowcurrently@gmail.com> wrote:

...

1. add a basic queue class for passing objects between interpreters * only support strings at first (though Nick pointed out we could fall back to pickle or marshal for unsupported objects) 2. implement CSP on top of subinterpreters 3. expand the queue's supported types 4. add something like Interpreter.call()

How is the GIL situation with subinterpreters these days, is the long-term goal still "solving multi-core Python", i.e. using multiple CPU cores from within the same process? Or is it mainly used for isolation? Sebastian

Eric Snow

4:09 p.m.

On Thu, Sep 7, 2017 at 1:14 PM, Sebastian Krause <sebastian@realpath.org> wrote:

...

How is the GIL situation with subinterpreters these days, is the long-term goal still "solving multi-core Python", i.e. using multiple CPU cores from within the same process? Or is it mainly used for isolation?

The GIL is still process-global. The goal is indeed to change this to support actual multi-core parallelism. However, the benefits of interpreter isolation are certainly a win otherwise. :) -eric

Nathaniel Smith

5:48 p.m.

On Thu, Sep 7, 2017 at 11:26 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:

...

Hi all,

As part of the multi-core work I'm proposing the addition of the "interpreters" module to the stdlib. This will expose the existing subinterpreters C-API to Python code. I've purposefully kept the API simple. Please let me know what you think.

My concern about this is the same as it was last time -- the work looks neat, but right now, almost no-one uses subinterpreters (basically it's Jep and mod_wsgi and that's it?), and therefore many packages get away with ignoring subinterpreters. Numpy is the one I'm most familiar with: when we get subinterpreter bugs we close them wontfix, because supporting subinterpreters properly would require non-trivial auditing, add overhead for non-subinterpreter use cases, and benefit a tiny tiny fraction of our users. If we add a friendly python-level API like this, then we're committing to this being a part of Python for the long term and encouraging people to use it, which puts pressure on downstream packages to do that work... but it's still not clear whether any benefits will actually materialize. I've actually argued with the PyPy devs to try to convince them to add subinterpreter support as part of their experiments with GIL-removal, because I think the semantics would genuinely be nicer to work with than raw threads, but they're convinced that it's impossible to make this work. Or more precisely, they think you could make it work in theory, but that it would be impossible to make it meaningfully more efficient than using multiple processes. I want them to be wrong, but I have to admit I can't see a way to make it work either... If this is being justified by the multicore use case, and specifically by the theory that having two interpreters in the same process will allow for more efficient communication than two interpreters in two different processes, then... why should we believe that that's actually possible? I want your project to succeed, but if it's going to fail then it seems better if it fails before we commit to exposing new APIs. -n -- Nathaniel J. Smith -- https://vorpus.org

Nick Coghlan

6:23 p.m.

On 7 September 2017 at 15:48, Nathaniel Smith <njs@pobox.com> wrote:

...

I've actually argued with the PyPy devs to try to convince them to add subinterpreter support as part of their experiments with GIL-removal, because I think the semantics would genuinely be nicer to work with than raw threads, but they're convinced that it's impossible to make this work. Or more precisely, they think you could make it work in theory, but that it would be impossible to make it meaningfully more efficient than using multiple processes. I want them to be wrong, but I have to admit I can't see a way to make it work either...

The gist of the idea is that with subinterpreters, your starting point is multiprocessing-style isolation (i.e. you have to use pickle to transfer data between subinterpreters), but you're actually running in a shared-memory threading context from the operating system's perspective, so you don't need to rely on mmap to share memory over a non-streaming interface. It's also definitely the case that to make this viable, we'd need to provide fast subinterpreter friendly alternatives to C globals for use by extension modules (otherwise adding subinterpreter compatibility will be excessively painful), and PEP 550 is likely to be helpful there. Personally, I think it would make sense to add the module under PEP 411 provisional status, and make it's continued existence as a public API contingent on actually delivering on the "lower overhead multi-core support than multiprocessing" goal (even if it only delivers on that front on Windows, where process creation is more expensive and there's no fork() equivalent). However, I'd also be entirely happy with our adding it as a private "_subinterpreters" API for testing & experimentation purposes (see https://bugs.python.org/issue30439 ), and reconsidering introducing it as a public API after there's more concrete evidence as to what can actually be achieved based on it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nathaniel Smith

7:15 p.m.

On Thu, Sep 7, 2017 at 4:23 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 7 September 2017 at 15:48, Nathaniel Smith <njs@pobox.com> wrote:

...
I've actually argued with the PyPy devs to try to convince them to add subinterpreter support as part of their experiments with GIL-removal, because I think the semantics would genuinely be nicer to work with than raw threads, but they're convinced that it's impossible to make this work. Or more precisely, they think you could make it work in theory, but that it would be impossible to make it meaningfully more efficient than using multiple processes. I want them to be wrong, but I have to admit I can't see a way to make it work either...

The gist of the idea is that with subinterpreters, your starting point is multiprocessing-style isolation (i.e. you have to use pickle to transfer data between subinterpreters), but you're actually running in a shared-memory threading context from the operating system's perspective, so you don't need to rely on mmap to share memory over a non-streaming interface.

The challenge is that streaming bytes between processes is actually really fast -- you don't really need mmap for that. (Maybe this was important for X11 back in the 1980s, but a lot has changed since then :-).) And if you want to use pickle and multiprocessing to send, say, a single big numpy array between processes, that's also really fast, because it's basically just a few memcpy's. The slow case is passing complicated objects between processes, and it's slow because pickle has to walk the object graph to serialize it, and walking the object graph is slow. Copying object graphs between subinterpreters has the same problem. So the only case I can see where I'd expect subinterpreters to make communication dramatically more efficient is if you have a "deeply immutable" type: one where not only are its instances immutable, but all objects reachable from those instances are also guaranteed to be immutable. So like, a tuple except that when you instantiate it it validates that all of its elements are also marked as deeply immutable, and errors out if not. Then when you go to send this between subinterpreters, you can tell by checking the type of the root object that the whole graph is immutable, so you don't need to walk it yourself. However, it seems impossible to support user-defined deeply-immutable types in Python: types and functions are themselves mutable and hold tons of references to other potentially mutable objects via __mro__, __globals__, __weakrefs__, etc. etc., so even if a user-defined instance can be made logically immutable it's still going to hold references to mutable things. So the one case where subinterpreters win is if you have a really big and complicated set of nested pseudo-tuples of ints and strings and you're bottlenecked on passing it between interpreters. Maybe frozendicts too. Is that enough to justify the whole endeavor? It seems dubious to me. I guess the other case where subprocesses lose to "real" threads is startup time on Windows. But starting a subinterpreter is also much more expensive than starting a thread, once you take into account the cost of loading the application's modules into the new interpreter. In both cases you end up needing some kind of process/subinterpreter pool or cache to amortize that cost. Obviously I'm committing the cardinal sin of trying to guess about performance based on theory instead of measurement, so maybe I'm wrong. Or maybe there's some deviously clever trick I'm missing. I hope so -- a really useful subinterpreter multi-core store would be awesome. -n -- Nathaniel J. Smith -- https://vorpus.org

Stephan Hoyer

8 p.m.

On Thu, Sep 7, 2017 at 5:15 PM Nathaniel Smith <njs@pobox.com> wrote:

...

On Thu, Sep 7, 2017 at 4:23 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
The gist of the idea is that with subinterpreters, your starting point is multiprocessing-style isolation (i.e. you have to use pickle to transfer data between subinterpreters), but you're actually running in a shared-memory threading context from the operating system's perspective, so you don't need to rely on mmap to share memory over a non-streaming interface.

The challenge is that streaming bytes between processes is actually really fast -- you don't really need mmap for that. (Maybe this was important for X11 back in the 1980s, but a lot has changed since then :-).) And if you want to use pickle and multiprocessing to send, say, a single big numpy array between processes, that's also really fast, because it's basically just a few memcpy's. The slow case is passing complicated objects between processes, and it's slow because pickle has to walk the object graph to serialize it, and walking the object graph is slow. Copying object graphs between subinterpreters has the same problem.

This doesn't match up with my (somewhat limited) experience. For example, in this table of bandwidth estimates from Matthew Rocklin (CCed), IPC is about 10x slower than a memory copy: http://matthewrocklin.com/blog/work/2015/12/29/data-bandwidth This makes a considerable difference when building a system do to parallel data analytics in Python (e.g., on NumPy arrays), which is exactly what Matthew has been working on for the past few years. I'm sure there are other ways to avoid this expensive IPC without using sub-interpreters, e.g., by using a tool like Plasma ( http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/). But I'm skeptical of your assessment that the current multiprocessing approach is fast enough.

Matthew Rocklin

8:14 p.m.

Those numbers were for common use in Python tools and reflected my anecdotal experience at the time with normal Python tools. I'm sure that there are mechanisms to achieve faster speeds than what I experienced. That being said, here is a small example. In [1]: import multiprocessing In [2]: data = b'0' * 100000000 # 100 MB In [3]: from toolz import identity In [4]: pool = multiprocessing.Pool() In [5]: %time _ = pool.apply_async(identity, (data,)).get() CPU times: user 76 ms, sys: 64 ms, total: 140 ms Wall time: 252 ms This is about 400MB/s for a roundtrip On Thu, Sep 7, 2017 at 9:00 PM, Stephan Hoyer <shoyer@gmail.com> wrote:

...

On Thu, Sep 7, 2017 at 5:15 PM Nathaniel Smith <njs@pobox.com> wrote:

...
On Thu, Sep 7, 2017 at 4:23 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
The gist of the idea is that with subinterpreters, your starting point is multiprocessing-style isolation (i.e. you have to use pickle to transfer data between subinterpreters), but you're actually running in a shared-memory threading context from the operating system's perspective, so you don't need to rely on mmap to share memory over a non-streaming interface.

The challenge is that streaming bytes between processes is actually really fast -- you don't really need mmap for that. (Maybe this was important for X11 back in the 1980s, but a lot has changed since then :-).) And if you want to use pickle and multiprocessing to send, say, a single big numpy array between processes, that's also really fast, because it's basically just a few memcpy's. The slow case is passing complicated objects between processes, and it's slow because pickle has to walk the object graph to serialize it, and walking the object graph is slow. Copying object graphs between subinterpreters has the same problem.

This doesn't match up with my (somewhat limited) experience. For example, in this table of bandwidth estimates from Matthew Rocklin (CCed), IPC is about 10x slower than a memory copy: http://matthewrocklin.com/blog/work/2015/12/29/data-bandwidth

This makes a considerable difference when building a system do to parallel data analytics in Python (e.g., on NumPy arrays), which is exactly what Matthew has been working on for the past few years.

I'm sure there are other ways to avoid this expensive IPC without using sub-interpreters, e.g., by using a tool like Plasma ( http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/). But I'm skeptical of your assessment that the current multiprocessing approach is fast enough.

Nathaniel Smith

11:08 p.m.

On Thu, Sep 7, 2017 at 6:14 PM, Matthew Rocklin <mrocklin@gmail.com> wrote:

...

Those numbers were for common use in Python tools and reflected my anecdotal experience at the time with normal Python tools. I'm sure that there are mechanisms to achieve faster speeds than what I experienced. That being said, here is a small example.

In [1]: import multiprocessing In [2]: data = b'0' * 100000000 # 100 MB In [3]: from toolz import identity In [4]: pool = multiprocessing.Pool() In [5]: %time _ = pool.apply_async(identity, (data,)).get() CPU times: user 76 ms, sys: 64 ms, total: 140 ms Wall time: 252 ms

This is about 400MB/s for a roundtrip

Awesome, thanks for bringing numbers into my wooly-headed theorizing :-). On my laptop I actually get a worse result from your benchmark: 531 ms for 100 MB == ~200 MB/s round-trip, or 400 MB/s one-way. So yeah, transferring data between processes with multiprocessing is slow. This is odd, though, because on the same machine, using socat to send 1 GiB between processes using a unix domain socket runs at 2 GB/s: # terminal 1 ~$ rm -f /tmp/unix.sock && socat -u -b32768 UNIX-LISTEN:/tmp/unix.sock "SYSTEM:pv -W > /dev/null" 1.00GiB 0:00:00 [1.89GiB/s] [<=> ] # terminal 2 ~$ socat -u -b32768 "SYSTEM:dd if=/dev/zero bs=1M count=1024" UNIX:/tmp/unix.sock 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.529814 s, 2.0 GB/s (Notice that the pv output is in GiB/s and the dd output is in GB/s. 1.89 GiB/s = 2.03 GB/s, so they actually agree.) On my system, Python allocates + copies memory at 2.2 GB/s, so bulk byte-level IPC is within 10% of within-process bulk copying: # same 100 MB bytestring as above In [7]: bytearray_data = bytearray(data) In [8]: %timeit bytearray_data.copy() 45.3 ms ± 540 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [9]: 0.100 / 0.0453 # GB / seconds Out[9]: 2.207505518763797 I don't know why multiprocessing is so slow -- maybe there's a good reason, maybe not. But the reason isn't that IPC is intrinsically slow, and subinterpreters aren't going to automatically be 5x faster because they can use memcpy. -n -- Nathaniel J. Smith -- https://vorpus.org

Antoine Pitrou

10 Sep 10 Sep

2:14 p.m.

On Thu, 7 Sep 2017 21:08:48 -0700 Nathaniel Smith <njs@pobox.com> wrote:

...

Awesome, thanks for bringing numbers into my wooly-headed theorizing :-).

On my laptop I actually get a worse result from your benchmark: 531 ms for 100 MB == ~200 MB/s round-trip, or 400 MB/s one-way. So yeah, transferring data between processes with multiprocessing is slow.

This is odd, though, because on the same machine, using socat to send 1 GiB between processes using a unix domain socket runs at 2 GB/s:

When using local communication, the raw IPC cost is often minor compared to whatever Python does with the data (parse it, dispatch tasks around, etc.) except when the data is really huge. Local communications on Linux can easily reach several GB/s (even using TCP to localhost). Here is a Python script with reduced overhead to measure it -- as opposed to e.g. a full-fledged event loop: https://gist.github.com/pitrou/d809618359915967ffc44b1ecfc2d2ad

...

I don't know why multiprocessing is so slow -- maybe there's a good reason, maybe not.

Be careful to measure actual bandwidth, not round-trip latency, however.

...

But the reason isn't that IPC is intrinsically slow, and subinterpreters aren't going to automatically be 5x faster because they can use memcpy.

What could improve performance significantly would be to share objects without any form of marshalling; but it's not obvious it's possible in the subinterpreters model *if* it also tries to remove the GIL. You can see it readily with concurrent.futures, when comparing ThreadPoolExecutor and ProcessPoolExecutor:

...

...
...
import concurrent.futures as cf ...:tp = cf.ThreadPoolExecutor(4) ...:pp = cf.ProcessPoolExecutor(4) ...:x = b"x" * (100 * 1024**2) ...:def identity(x): return x ...: y = list(tp.map(identity, [x] * 10)) # warm up len(y) 10 y = list(pp.map(identity, [x] * 10)) # warm up len(y) 10 %timeit y = list(tp.map(identity, [x] * 10)) 638 µs ± 71.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit y = list(pp.map(identity, [x] * 10)) 1.99 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

On this trivial case you're really gaining a lot using a thread pool... Regards Antoine.

Eric Snow

12 Sep 12 Sep

3:54 p.m.

On Sun, Sep 10, 2017 at 12:14 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

What could improve performance significantly would be to share objects without any form of marshalling; but it's not obvious it's possible in the subinterpreters model *if* it also tries to remove the GIL.

Yep. This is one of the main challenges relative to the goal of fully utilizing multiple cores. -eric

Eric Snow

7 Sep 7 Sep

10:11 p.m.

First of all, thanks for the feedback and encouragement! Responses in-line below. -eric On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

My concern about this is the same as it was last time -- the work looks neat, but right now, almost no-one uses subinterpreters (basically it's Jep and mod_wsgi and that's it?), and therefore many packages get away with ignoring subinterpreters.

My concern is that this is a chicken-and-egg problem. The situation won't improve until subinterpreters are more readily available.

...

Numpy is the one I'm most familiar with: when we get subinterpreter bugs we close them wontfix, because supporting subinterpreters properly would require non-trivial auditing, add overhead for non-subinterpreter use cases, and benefit a tiny tiny fraction of our users.

The main problem of which I'm aware is C globals in libraries and extension modules. PEPs 489 and 3121 are meant to help but I know that there is at least one major situation which is still a blocker for multi-interpreter-safe module state. Other than C globals, is there some other issue?

...

If we add a friendly python-level API like this, then we're committing to this being a part of Python for the long term and encouraging people to use it, which puts pressure on downstream packages to do that work... but it's still not clear whether any benefits will actually materialize.

I'm fine with Nick's idea about making this a "provisional" module. Would that be enough to ease your concern here?

...

I've actually argued with the PyPy devs to try to convince them to add subinterpreter support as part of their experiments with GIL-removal, because I think the semantics would genuinely be nicer to work with than raw threads, but they're convinced that it's impossible to make this work. Or more precisely, they think you could make it work in theory, but that it would be impossible to make it meaningfully more efficient than using multiple processes. I want them to be wrong, but I have to admit I can't see a way to make it work either...

Yikes! Given the people involved I don't find that to be a good sign. Nevertheless, I still consider my ultimate goals to be tractable and will press forward. At each step thus far, the effort has led to improvements that extend beyond subinterpreters and multi-core. I see that trend continuing for the entirety of the project. Even if my final goal is not realized, the result will still be significantly net positive...and I still think it will still work out. :)

...

If this is being justified by the multicore use case, and specifically by the theory that having two interpreters in the same process will allow for more efficient communication than two interpreters in two different processes, then... why should we believe that that's actually possible? I want your project to succeed, but if it's going to fail then it seems better if it fails before we commit to exposing new APIs.

The project is partly about performance. However, it's also particularly about offering a alternative concurrency model with an implementation that can run in multiple threads simultaneously in the same process. On Thu, Sep 7, 2017 at 5:15 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

The slow case is passing complicated objects between processes, and it's slow because pickle has to walk the object graph to serialize it, and walking the object graph is slow. Copying object graphs between subinterpreters has the same problem.

The initial goal is to support passing only strings between interpreters. Later efforts will involve investigating approaches to efficiently and safely passing other objects.

...

So the only case I can see where I'd expect subinterpreters to make communication dramatically more efficient is if you have a "deeply immutable" type [snip] However, it seems impossible to support user-defined deeply-immutable types in Python: [snip]

I agree that it is currently not an option. That is part of the exercise. There are a number of possible solutions to explore once we get to that point. However, this PEP isn't about that. I'm confident enough about the possibilities that I'm comfortable with moving forward here.

...

I guess the other case where subprocesses lose to "real" threads is startup time on Windows. But starting a subinterpreter is also much more expensive than starting a thread, once you take into account the cost of loading the application's modules into the new interpreter. In both cases you end up needing some kind of process/subinterpreter pool or cache to amortize that cost.

Interpreter startup costs (and optimization strategies) are another aspect of the project which deserve attention. However, we'll worry about that after the core functionality has been achieved.

...

Obviously I'm committing the cardinal sin of trying to guess about performance based on theory instead of measurement, so maybe I'm wrong. Or maybe there's some deviously clever trick I'm missing.

:) I'd certainly be interested in more data regarding the relative performance of fork/multiprocess+IPC vs. subinterpreters. However, it's going to be hard to draw any conclusions until the work is complete. :)

...

I hope so -- a really useful subinterpreter multi-core store would be awesome.

Agreed! Thanks for the encouragement. :)

Nathaniel Smith

8 Sep 8 Sep

1:19 a.m.

On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:

...

First of all, thanks for the feedback and encouragement! Responses in-line below.

I hope it's helpful! More responses in-line as well.

...

On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smith <njs@pobox.com> wrote:

...
My concern about this is the same as it was last time -- the work looks neat, but right now, almost no-one uses subinterpreters (basically it's Jep and mod_wsgi and that's it?), and therefore many packages get away with ignoring subinterpreters.

My concern is that this is a chicken-and-egg problem. The situation won't improve until subinterpreters are more readily available.

Okay, but you're assuming that "more libraries work well with subinterpreters" is in fact an improvement. I'm asking you to convince me of that :-). Are there people saying "oh, if only subinterpreters had a Python API and less weird interactions with C extensions, I could do <something awesome>"? So far they haven't exactly taken the world by storm...

...

...
Numpy is the one I'm most familiar with: when we get subinterpreter bugs we close them wontfix, because supporting subinterpreters properly would require non-trivial auditing, add overhead for non-subinterpreter use cases, and benefit a tiny tiny fraction of our users.

The main problem of which I'm aware is C globals in libraries and extension modules. PEPs 489 and 3121 are meant to help but I know that there is at least one major situation which is still a blocker for multi-interpreter-safe module state. Other than C globals, is there some other issue?

That's the main one I'm aware of, yeah, though I haven't looked into it closely.

...

...
If we add a friendly python-level API like this, then we're committing to this being a part of Python for the long term and encouraging people to use it, which puts pressure on downstream packages to do that work... but it's still not clear whether any benefits will actually materialize.

I'm fine with Nick's idea about making this a "provisional" module. Would that be enough to ease your concern here?

Potentially, yeah -- basically I'm fine with anything that doesn't end up looking like python-dev telling everyone "subinterpreters are the future! go forth and yell at any devs who don't support them!". What do you think the criteria for graduating to non-provisional status should be, in this case? [snip]

...

...
So the only case I can see where I'd expect subinterpreters to make communication dramatically more efficient is if you have a "deeply immutable" type [snip] However, it seems impossible to support user-defined deeply-immutable types in Python: [snip]

I agree that it is currently not an option. That is part of the exercise. There are a number of possible solutions to explore once we get to that point. However, this PEP isn't about that. I'm confident enough about the possibilities that I'm comfortable with moving forward here.

I guess I would be much more confident in the possibilities here if you could give: - some hand-wavy sketch for how subinterpreter A could call a function that as originally defined in subinterpreter B without the GIL, which seems like a precondition for sharing user-defined classes - some hand-wavy sketch for how refcounting will work for objects shared between multiple subinterpreters without the GIL, without majorly impacting single-thread performance (I actually forgot about this problem in my last email, because PyPy has already solved this part!) These are the two problems where I find it most difficult to have faith. [snip]

...

...
I hope so -- a really useful subinterpreter multi-core stor[y] would be awesome.

Agreed! Thanks for the encouragement. :)

Thanks for attempting such an ambitious project :-). -n -- Nathaniel J. Smith -- https://vorpus.org

Eric Snow

12 Sep 12 Sep

3:46 p.m.

On Thu, Sep 7, 2017 at 11:19 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:

...
My concern is that this is a chicken-and-egg problem. The situation won't improve until subinterpreters are more readily available.

Okay, but you're assuming that "more libraries work well with subinterpreters" is in fact an improvement. I'm asking you to convince me of that :-). Are there people saying "oh, if only subinterpreters had a Python API and less weird interactions with C extensions, I could do <something awesome>"? So far they haven't exactly taken the world by storm...

The problem is that most people don't know about the feature. And even if they do, using it requires writing a C-extension, which most people aren't comfortable doing.

...

...
Other than C globals, is there some other issue?

That's the main one I'm aware of, yeah, though I haven't looked into it closely.

Oh, good. I haven't missed something. :) Do you know how often subinterpreter support is a problem for users? I was under the impression from your earlier statements that this is a recurring issue but my understanding from mod_wsgi is that it isn't that common.

...

...
I'm fine with Nick's idea about making this a "provisional" module. Would that be enough to ease your concern here?

Potentially, yeah -- basically I'm fine with anything that doesn't end up looking like python-dev telling everyone "subinterpreters are the future! go forth and yell at any devs who don't support them!".

Great! I'm also looking at the possibility of adding a mechanism for extension modules to opt out of subinterpreter support (using PEP 489 ModuleDef slots). However, I'd rather wait on that if making the PEP provisional is sufficient.

...

What do you think the criteria for graduating to non-provisional status should be, in this case?

Consensus among the (Dutch?) core devs that subinterpreters are worth keeping in the stdlib and that we've smoothed out any rough parts in the module.

...

I guess I would be much more confident in the possibilities here if you could give:

- some hand-wavy sketch for how subinterpreter A could call a function that as originally defined in subinterpreter B without the GIL, which seems like a precondition for sharing user-defined classes

(Before I respond, note that this is way outside the scope of the PEP. The merit of subinterpreters extends beyond any benefits of running sans-GIL, though that is my main goal. I've been updating the PEP to (hopefully) better communicate the utility of subinterpreters.) Code objects are immutable so that part should be relatively straight-forward. There's the question of closures and default arguments that would have to be resolved. However, those are things that would need to be supported anyway in a world where we want to pass functions and user-defined types between interpreters. Doing so will be a gradual process of starting with immutable non-container builtin types and expanding out from there to other immutable types, including user-defined ones. Note that sharing mutable objects between interpreters would be a pretty advanced usage (i.e. opt-in shared state vs. threading's share-everything). If it proves desirable then we'd sort that out then. However, I don't see that as a more than an esoteric feature relative to subinterpreters. In my mind, the key advantage of being able to share more (immutable) objects, including user-defined types, between interpreters is in the optimization opportunities. It would allow us to avoid instantiating the same object in each interpreter. That said, the way I imagine it I wouldn't consider such an optimization to be very user-facing so it doesn't impact the PEP. The user-facing part would be the expanded set of immutable objects interpreters could pass back and forth, and expanding that set won't require any changes to the API in the PEP.

...

- some hand-wavy sketch for how refcounting will work for objects shared between multiple subinterpreters without the GIL, without majorly impacting single-thread performance (I actually forgot about this problem in my last email, because PyPy has already solved this part!)

(same caveat as above) There are a number of approaches that may work. One is to give each interpreter its own allocator and GC. Another is to mark shared objects such that they never get GC'ed. Another is to allow objects to exist only in one interpreter at a time. Similarly, object ownership (per interpreter) could help. Asynchronous refcounting could be an option. That's only some of the possible approaches. I expect that at least one of them will be suitable. However, the first step is to get the multi-interpreter support out there. Then we can tackle the problem of optimization and multi-core utilization. FWIW, the biggest complexity is actually in synchronizing the sharing strategy across the inter-interpreter boundary (e.g. FIFO). We should expect the relative time spent passing objects between interpreters to be very small. So not only does that provide us will a good target for our refcount resolving strategy, we can afford some performance wiggle room in that solution. (again, we're looking way ahead here)

...

Thanks for attempting such an ambitious project :-).

Hey, I'm learning a lot and feel like every step along the way is making Python better in some stand-alone way. :) -eric

Nathaniel Smith

11:10 p.m.

On Tue, Sep 12, 2017 at 1:46 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:

...

On Thu, Sep 7, 2017 at 11:19 PM, Nathaniel Smith <njs@pobox.com> wrote:

...
On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:

...
My concern is that this is a chicken-and-egg problem. The situation won't improve until subinterpreters are more readily available.

Okay, but you're assuming that "more libraries work well with subinterpreters" is in fact an improvement. I'm asking you to convince me of that :-). Are there people saying "oh, if only subinterpreters had a Python API and less weird interactions with C extensions, I could do <something awesome>"? So far they haven't exactly taken the world by storm...

The problem is that most people don't know about the feature. And even if they do, using it requires writing a C-extension, which most people aren't comfortable doing.

...
...
Other than C globals, is there some other issue?

That's the main one I'm aware of, yeah, though I haven't looked into it closely.

Oh, good. I haven't missed something. :) Do you know how often subinterpreter support is a problem for users? I was under the impression from your earlier statements that this is a recurring issue but my understanding from mod_wsgi is that it isn't that common.

It looks like we've been averaging one bug report every ~6 months for the last 3 years: https://github.com/numpy/numpy/issues?utf8=%E2%9C%93&q=is%3Aissue%20subinterpreter%20OR%20subinterpreters They mostly come from Jep, not mod_wsgi. (Possibly because Jep has some built-in numpy integration.) I don't know how many people file bugs versus just living with it or finding some workaround. I suspect for mod_wsgi in particular they probably switch to something else -- it's not like there's any shortage of WSGI servers that avoid these problems. And for Jep there are prominent warnings to expect problems and suggesting workarounds: https://github.com/ninia/jep/wiki/Workarounds-for-CPython-Extensions

...

...
I guess I would be much more confident in the possibilities here if you could give:

- some hand-wavy sketch for how subinterpreter A could call a function that as originally defined in subinterpreter B without the GIL, which seems like a precondition for sharing user-defined classes

(Before I respond, note that this is way outside the scope of the PEP. The merit of subinterpreters extends beyond any benefits of running sans-GIL, though that is my main goal. I've been updating the PEP to (hopefully) better communicate the utility of subinterpreters.)

Subinterpreters are basically an attempt to reimplement the OS's process isolation in user-space, right? Classic trade-off where we accept added complexity and fragility in the hopes of gaining some speed? I just looked at the PEP again, and I'm afraid I still don't understand what the benefits are unless we can remove the GIL and somehow get a speedup over processes. Implementing CSP is a neat idea, but you could do it with subprocesses too. AFAICT you could implement the whole subinterpreters module API with subprocesses on 3.6, and it'd be multi-core and have perfect extension module support.

...

Code objects are immutable so that part should be relatively straight-forward. There's the question of closures and default arguments that would have to be resolved. However, those are things that would need to be supported anyway in a world where we want to pass functions and user-defined types between interpreters. Doing so will be a gradual process of starting with immutable non-container builtin types and expanding out from there to other immutable types, including user-defined ones.

I tried arguing that code objects were immutable to the PyPy devs too :-). The problem is that to call a function you need both its __code__, which is immutable, and its __globals__, which is emphatically not. The __globals__ thing means that if you start from an average function you can often follow pointers to reach every other global object (e.g. if the function uses regular expressions, you can probably reach any module by doing func.__globals__["re"].sys.modules[...]). You might hope that you could somehow restrict this, but I can't think of any way that's really useful :-(.

...

Note that sharing mutable objects between interpreters would be a pretty advanced usage (i.e. opt-in shared state vs. threading's share-everything). If it proves desirable then we'd sort that out then. However, I don't see that as a more than an esoteric feature relative to subinterpreters.

In my mind, the key advantage of being able to share more (immutable) objects, including user-defined types, between interpreters is in the optimization opportunities.

But even if we can add new language features for "freezing" user-defined objects, then their .__class__ will still be mutable, their methods will still have mutable .__globals__, etc. And even if we somehow made it so their methods only got a read-only view on the original interpreter's state, that still wouldn't protect against race conditions and memory corruption, because the original interpreter might be mutating things while the subinterpreter is looking at them.

...

It would allow us to avoid instantiating the same object in each interpreter. That said, the way I imagine it I wouldn't consider such an optimization to be very user-facing so it doesn't impact the PEP. The user-facing part would be the expanded set of immutable objects interpreters could pass back and forth, and expanding that set won't require any changes to the API in the PEP.

...
- some hand-wavy sketch for how refcounting will work for objects shared between multiple subinterpreters without the GIL, without majorly impacting single-thread performance (I actually forgot about this problem in my last email, because PyPy has already solved this part!)

(same caveat as above)

There are a number of approaches that may work. One is to give each interpreter its own allocator and GC.

This makes sense to me if the subinterpreters aren't going to share any references, but obviously that wasn't the question :-). And I don't see how this makes it easier to work with references that cross between different GC domains. If anything it seems like it would make things harder.

...

Another is to mark shared objects such that they never get GC'ed.

I don't think just leaking everything all the time is viable :-(. And even so this requires traversing the whole object reference graph on every communication operation, which defeats the purpose; the whole idea here was to find something that doesn't have to walk the object graph, because that's what makes pickle slow.

...

Another is to allow objects to exist only in one interpreter at a time.

Yeah, like rust -- very neat if we can do it! If I want you to give you this object, I can't have it anymore myself. But... I haven't been able to think of any way we could actually enforce this efficiently. When passing an object between subinterpreters you could require that the root object has refcount 1, and then do like a little mini-mark-and-sweep to make sure all the objects reachable from it only have references that are within the local object graph. But then we're traversing the object graph again. This would also require new syntax, because you need something like a simultaneous send-and-del, and you can't fake a del with a function call. (It also adds another extension module incompatibility, because it would require every extension type to implement tp_traverse -- previously this was only mandatory for objects that could indirectly reference themselves. But maybe the extension type thing doesn't matter because this would have to be restricted to builtin immutable types anyway, as per above.)

...

Similarly, object ownership (per interpreter) could help.

How?

...

Asynchronous refcounting could be an option.

Right, like in the GILectomy branch. This is the only potentially viable solution I can think of (short of dropping refcounting altogether, which I think is how most multi-core languages solve this, and why PyPy has a head start on GIL removal). Do we know yet how much single-thread overhead this adds? It makes me nervous too -- a lot of the attraction of subinterpreters is that the minimal shared state is supposed to make GIL removal easier and less risky than if we were attempting a full GILectomy. But in this case I don't see how to exploit their isolation at all.

...

That's only some of the possible approaches. I expect that at least one of them will be suitable.

The reason I'm pushing on this is exactly because I don't expect that; I think it's very likely that you'll spend a bunch of time on the fun easier parts, and then discover that actually the hard parts are impossible. If you want me to shut up and leave you to it, say the word :-).

...

However, the first step is to get the multi-interpreter support out there. Then we can tackle the problem of optimization and multi-core utilization.

FWIW, the biggest complexity is actually in synchronizing the sharing strategy across the inter-interpreter boundary (e.g. FIFO). We should expect the relative time spent passing objects between interpreters to be very small. So not only does that provide us will a good target for our refcount resolving strategy, we can afford some performance wiggle room in that solution. (again, we're looking way ahead here)

But if the only advantage of subinterpreters over subprocesses is that the communication costs are lower, then surely you should be targeting cases where the communication costs are high? -n -- Nathaniel J. Smith -- https://vorpus.org

Nick Coghlan

13 Sep 13 Sep

5:46 p.m.

On 13 September 2017 at 14:10, Nathaniel Smith <njs@pobox.com> wrote:

...

Subinterpreters are basically an attempt to reimplement the OS's process isolation in user-space, right?

Not really, they're more an attempt to make something resembling Rust's memory model available to Python programs - having the default behaviour be "memory is not shared", but having the choice to share when you want to be entirely an application level decision, without getting into the kind of complexity needed to deliberately break operating system level process isolation. The difference is that where Rust was able to do that on a per-thread basis and rely on their borrow checker for enforcement of memory ownership, for PEP 554, we're proposing to do it on a per-interpreter basis, and rely on runtime object space partitioning (where Python objects and the memory allocators are *not* shared between interpreters) to keep things separated from each other. That's why memoryview is such a key part of making the proposal interesting: it's what lets us relatively easily poke holes in the object level partitioning between interpreters and provide zero-copy messaging passing without having to share any regular reference counts between interpreters (which in turn is what makes it plausible that we may eventually be able to switch to a true GIL-per-interpreter model, with only a few cross-interpreter locks for operations like accessing the list of interpreters itself). Right now, the closest equivalent to this programming model that Python offers is to combine threads with queue.Queue, and it requires a lot of programming discipline to ensure that you don't access an object again once you've submitted to a queue. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan

5:58 p.m.

On 14 September 2017 at 08:46, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 13 September 2017 at 14:10, Nathaniel Smith <njs@pobox.com> wrote:

...
Subinterpreters are basically an attempt to reimplement the OS's process isolation in user-space, right?

Not really, they're more an attempt to make something resembling Rust's memory model available to Python programs - having the default behaviour be "memory is not shared", but having the choice to share when you want to be entirely an application level decision, without getting into the kind of complexity needed to deliberately break operating system level process isolation.

I should also clarify: *Eric* still has hopes of sharing actual objects between subinterpreters without copying them. *I* think that's a forlorn hope, and expect that communicating between subinterpreters is going to end up looking an awful lot like communicating between subprocesses via shared memory. The trade-off between the two models will then be that one still just looks like a single process from the point of view of the outside world, and hence doesn't place any extra demands on the underlying OS beyond those required to run CPython with a single interpreter, while the other gives much stricter isolation (including isolating C globals in extension modules), but also demands much more from the OS when it comes to its IPC capabilities. The security risk profiles of the two approaches will also be quite different, since using subinterpreters won't require deliberately poking holes in the process isolation that operating systems give you by default. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ronald Oussoren

10 Sep 10 Sep

5:18 a.m.

...

On 8 Sep 2017, at 05:11, Eric Snow <ericsnowcurrently@gmail.com> wrote:

...

On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smith <njs@pobox.com> wrote:

...
Numpy is the one I'm most familiar with: when we get subinterpreter bugs we close them wontfix, because supporting subinterpreters properly would require non-trivial auditing, add overhead for non-subinterpreter use cases, and benefit a tiny tiny fraction of our users.

The main problem of which I'm aware is C globals in libraries and extension modules. PEPs 489 and 3121 are meant to help but I know that there is at least one major situation which is still a blocker for multi-interpreter-safe module state. Other than C globals, is there some other issue?

There’s also the PyGilState_* API that doesn't support multiple interpreters. The issue there is that callbacks from external libraries back into python need to use the correct subinterpreter. Ronald

Eric Snow

12 Sep 12 Sep

3:48 p.m.

Yep. See http://bugs.python.org/issue10915 and http://bugs.python.org/issue15751. The issue of C-extension support for subinterpreters is, of course, a critical one here. At the very least, incompatible modules should be able to opt out of subinterpreter support. I've updated the PEP to discuss this. -eric On Sun, Sep 10, 2017 at 3:18 AM, Ronald Oussoren <ronaldoussoren@mac.com> wrote:

...

...
On 8 Sep 2017, at 05:11, Eric Snow <ericsnowcurrently@gmail.com> wrote:

...
On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smith <njs@pobox.com> wrote:

...
Numpy is the one I'm most familiar with: when we get subinterpreter bugs we close them wontfix, because supporting subinterpreters properly would require non-trivial auditing, add overhead for non-subinterpreter use cases, and benefit a tiny tiny fraction of our users.

The main problem of which I'm aware is C globals in libraries and extension modules. PEPs 489 and 3121 are meant to help but I know that there is at least one major situation which is still a blocker for multi-interpreter-safe module state. Other than C globals, is there some other issue?

There’s also the PyGilState_* API that doesn't support multiple interpreters.

The issue there is that callbacks from external libraries back into python need to use the correct subinterpreter.

Ronald

Koos Zevenhoven

10 Sep 10 Sep

9:52 a.m.

On Thu, Sep 7, 2017 at 9:26 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote: [...]

...

get_main():

Return the main interpreter.

I assume the concept of a main interpreter is inherited from the previous levels of support in the C API, but what exactly is the significance of being "the main interpreter"? Instead, could they just all be subinterpreters of the same Python process (or whatever the right wording would be)? It might also be helpful if the PEP had a short description of what are considered subinterpreters and how they differ from threads of the same interpreter [*]. Currently, the PEP seems to rely heavily on knowledge of the previously available concepts. However, as this would be a new module, I don't think there's any need to blindly copy the previous design, regardless of how well the design may have served its purpose at the time. -- Koos [*] For instance regarding the role of the glo... local interpreter locks (LILs) ;) -- + Koos Zevenhoven + http://twitter.com/k7hoven +

Nick Coghlan

11 Sep 11 Sep

12:32 a.m.

On 11 September 2017 at 00:52, Koos Zevenhoven <k7hoven@gmail.com> wrote:

...

On Thu, Sep 7, 2017 at 9:26 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote: [...]

...
get_main():

Return the main interpreter.

I assume the concept of a main interpreter is inherited from the previous levels of support in the C API, but what exactly is the significance of being "the main interpreter"? Instead, could they just all be subinterpreters of the same Python process (or whatever the right wording would be)?

The main interpreter is ultimately responsible for the actual process global state: standard streams, signal handlers, dynamically linked libraries, __main__ module, etc. The line between it and the "CPython Runtime" is fuzzy for both practical and historical reasons, but the regular Python CLI will always have a "first created, last destroyed" main interpreter, simply because we don't really gain anything significant from eliminating it as a concept. By contrast, embedding applications that *don't* have a __main__ module, and already manage most process global state themselves without the assistance of the CPython Runtime can already get pretty close to just having a pool of peer subinterpreters, and will presumably be able to get closer over time as the subinterpreter support becomes more robust. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Koos Zevenhoven

3:02 a.m.

On Mon, Sep 11, 2017 at 8:32 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 11 September 2017 at 00:52, Koos Zevenhoven <k7hoven@gmail.com> wrote:

...
On Thu, Sep 7, 2017 at 9:26 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote: [...]

...
get_main():

Return the main interpreter.

I assume the concept of a main interpreter is inherited from the previous levels of support in the C API, but what exactly is the significance of being "the main interpreter"? Instead, could they just all be subinterpreters of the same Python process (or whatever the right wording would be)?

The main interpreter is ultimately responsible for the actual process global state: standard streams, signal handlers, dynamically linked libraries, __main__ module, etc.

Hmm. It is not clear, for instance, why a signal handler could not be owned by an interpreter that wasn't the first one started. Or, if a non-main process imports a module from a dynamically linked library, does it delegate that to the main interpreter? And do sys.stdout et al. not exist in the other interpreters? The line between it and the "CPython Runtime" is fuzzy for both

...

practical and historical reasons, but the regular Python CLI will always have a "first created, last destroyed" main interpreter, simply because we don't really gain anything significant from eliminating it as a concept.

I fear that emphasizing the main interpreter will lead to all kinds of libraries/programs that somehow unnecessarily rely on some or all tasks being performed in the main interpreter. Then you'll have a hard time running two of them in parallel in the same process, because you don't have two main interpreters. -- Koos PS. There's a saying... something like "always say never" ;)

...

By contrast, embedding applications that *don't* have a __main__ module, and already manage most process global state themselves without the assistance of the CPython Runtime can already get pretty close to just having a pool of peer subinterpreters, and will presumably be able to get closer over time as the subinterpreter support becomes more robust.

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

-- + Koos Zevenhoven + http://twitter.com/k7hoven +

Nick Coghlan

12 Sep 12 Sep

5:40 a.m.

On 11 September 2017 at 18:02, Koos Zevenhoven <k7hoven@gmail.com> wrote:

...

On Mon, Sep 11, 2017 at 8:32 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
The line between it and the "CPython Runtime" is fuzzy for both practical and historical reasons, but the regular Python CLI will always have a "first created, last destroyed" main interpreter, simply because we don't really gain anything significant from eliminating it as a concept.

I fear that emphasizing the main interpreter will lead to all kinds of libraries/programs that somehow unnecessarily rely on some or all tasks being performed in the main interpreter. Then you'll have a hard time running two of them in parallel in the same process, because you don't have two main interpreters.

You don't need to fear this scenario, since it's a description of the status quo (and it's the primary source of overstated claims about subinterpreters being "fundamentally broken"). So no, not everything will be subinterpreter-friendly, just as not everything in Python is thread-safe, and not everything is portable across platforms. That's OK - it just means we'll aim to make as many things as possible implicitly subinterpreter-friendly, and for everything else, we'll aim to minimise the adjustments needed to *make* things subinterpreter friendly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Koos Zevenhoven

9:35 a.m.

On Tue, Sep 12, 2017 at 1:40 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 11 September 2017 at 18:02, Koos Zevenhoven <k7hoven@gmail.com> wrote:

...
On Mon, Sep 11, 2017 at 8:32 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
The line between it and the "CPython Runtime" is fuzzy for both practical and historical reasons, but the regular Python CLI will always have a "first created, last destroyed" main interpreter, simply because we don't really gain anything significant from eliminating it as a concept.

I fear that emphasizing the main interpreter will lead to all kinds of libraries/programs that somehow unnecessarily rely on some or all tasks being performed in the main interpreter. Then you'll have a hard time running two of them in parallel in the same process, because you don't have two main interpreters.

You don't need to fear this scenario, since it's a description of the status quo (and it's the primary source of overstated claims about subinterpreters being "fundamentally broken").

Well, if that's true, it's hardly a counter-argument to what I said. Anyway, there is no status quo about what is proposed in the PEP. And as long as the existing APIs are preserved, why not make the new one less susceptible to overstated fundamental brokenness?

...

So no, not everything will be subinterpreter-friendly, just as not everything in Python is thread-safe, and not everything is portable across platforms.

I don't see how the situation benefits from calling something the "main interpreter". Subinterpreters can be a way to take something non-thread-safe and make it thread-safe, because in an interpreter-per-thread scheme, most of the state, like module globals, are thread-local. (Well, this doesn't help for async concurrency, but anyway.)

...

That's OK - it just means we'll aim to make as many things as possible implicitly subinterpreter-friendly, and for everything else, we'll aim to minimise the adjustments needed to *make* things subinterpreter friendly.

And that's exactly what I'm after here! I'm mostly just worried about the `get_main()` function. Maybe it should be called `asdfjaosjnoijb()`, so people wouldn't use it. Can't the first running interpreter just introduce itself to its children? And if that's too much to ask, maybe there could be a `get_parent()` function, which would give you the interpreter that spawned the current subinterpreter. Well OK, perhaps the current implementation only allows the "main interpreter" to spawn new interpreters (I have no idea). In that case, `get_parent()` will just be another, more future-proof name for `get_main()`. Then it would just need a clear documentation of the differences between the parent and its children. If the author of user code is not being too lazy, they might even read the docs and figure out if they *really* need to make a big deal out of which interpreter is the main/parent one. Still, I'm not convinced that there needs to be a get_main or get_parent. It shouldn't be too hard for users to make a wrapper around the API that provides this functionality. And if they do that––and use it to make their code "fundamentally broken"––then... at least we tried. ––Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

Stefan Krah

9:53 a.m.

On Tue, Sep 12, 2017 at 05:35:34PM +0300, Koos Zevenhoven wrote:

...

I don't see how the situation benefits from calling something the "main interpreter". Subinterpreters can be a way to take something non-thread-safe and make it thread-safe, because in an interpreter-per-thread scheme, most of the state, like module globals, are thread-local. (Well, this doesn't help for async concurrency, but anyway.)

You could have a privileged C extension that is only imported in the main interpreter: if get_current_interp() is main_interp(): from _decimal import * else: from _pydecimal import * This is of course only attractive if importing the interpreters module and calling these functions has minimal overhead. Stefan Krah

Koos Zevenhoven

10:30 a.m.

On Tue, Sep 12, 2017 at 5:53 PM, Stefan Krah <stefan@bytereef.org> wrote:

...

On Tue, Sep 12, 2017 at 05:35:34PM +0300, Koos Zevenhoven wrote:

...
I don't see how the situation benefits from calling something the "main interpreter". Subinterpreters can be a way to take something non-thread-safe and make it thread-safe, because in an interpreter-per-thread scheme, most of the state, like module globals, are thread-local. (Well, this doesn't help for async concurrency, but anyway.)

You could have a privileged C extension that is only imported in the main interpreter:

if get_current_interp() is main_interp(): from _decimal import * else: from _pydecimal import *

Or it could be first-come first-served: if is_imported_by_other_process("_decimal"): from _pydecimal import * else from _decimal import * ––Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

Koos Zevenhoven

10:33 a.m.

On Tue, Sep 12, 2017 at 6:30 PM, Koos Zevenhoven <k7hoven@gmail.com> wrote:

...

On Tue, Sep 12, 2017 at 5:53 PM, Stefan Krah <stefan@bytereef.org> wrote:

...
On Tue, Sep 12, 2017 at 05:35:34PM +0300, Koos Zevenhoven wrote:

...
I don't see how the situation benefits from calling something the "main interpreter". Subinterpreters can be a way to take something non-thread-safe and make it thread-safe, because in an interpreter-per-thread scheme, most of the state, like module globals, are thread-local. (Well, this doesn't help for async concurrency, but anyway.)

You could have a privileged C extension that is only imported in the main interpreter:

if get_current_interp() is main_interp(): from _decimal import * else: from _pydecimal import *

Oops.. it should of course be "by_this_process", not "by_other_process" (fixed below).

...

Or it could be first-come first-served:

if is_imported_by_ this _process("_decimal"):

from _pydecimal import *

...

else from _decimal import *

––Koos

-- + Koos Zevenhoven + http://twitter.com/k7hoven +

-- + Koos Zevenhoven + http://twitter.com/k7hoven +

Nick Coghlan

10:14 p.m.

On 13 September 2017 at 00:35, Koos Zevenhoven <k7hoven@gmail.com> wrote:

...

On Tue, Sep 12, 2017 at 1:40 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On 11 September 2017 at 18:02, Koos Zevenhoven <k7hoven@gmail.com> wrote:

...
On Mon, Sep 11, 2017 at 8:32 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
The line between it and the "CPython Runtime" is fuzzy for both practical and historical reasons, but the regular Python CLI will always have a "first created, last destroyed" main interpreter, simply because we don't really gain anything significant from eliminating it as a concept.

I fear that emphasizing the main interpreter will lead to all kinds of libraries/programs that somehow unnecessarily rely on some or all tasks being performed in the main interpreter. Then you'll have a hard time running two of them in parallel in the same process, because you don't have two main interpreters.

You don't need to fear this scenario, since it's a description of the status quo (and it's the primary source of overstated claims about subinterpreters being "fundamentally broken").

Well, if that's true, it's hardly a counter-argument to what I said. Anyway, there is no status quo about what is proposed in the PEP.

Yes, there is, since subinterpreters are an existing feature of the CPython implementation. What's new in the PEP is the idea of giving that feature a Python level API so that it's available to regular Python programs, rather than only being available to embedding applications that choose to use it (e.g. mod_wsgi).

...

And as long as the existing APIs are preserved, why not make the new one less susceptible to overstated fundamental brokenness?

Having a privileged main interpreter isn't fundamentally broken, since you aren't going to run __main__ in more than one interpreter, just as you don't run __main__ in more than one thread (and multiprocessing deliberately avoids running the "if __name__ == '__main__'" sections of it in more than one process).

...

...
So no, not everything will be subinterpreter-friendly, just as not everything in Python is thread-safe, and not everything is portable across platforms.

I don't see how the situation benefits from calling something the "main interpreter". Subinterpreters can be a way to take something non-thread-safe and make it thread-safe, because in an interpreter-per-thread scheme, most of the state, like module globals, are thread-local. (Well, this doesn't help for async concurrency, but anyway.)

"The interpreter that runs __main__" is never going to go away as a concept for the regular CPython CLI. Right now, its also a restriction even for applications like mod_wsgi, since the GIL state APIs always register C created threads with the main interpreter.

...

...
That's OK - it just means we'll aim to make as many things as possible implicitly subinterpreter-friendly, and for everything else, we'll aim to minimise the adjustments needed to *make* things subinterpreter friendly.

And that's exactly what I'm after here!

No, you're after deliberately making the proposed API non-representative of how the reference implementation actually works because of a personal aesthetic preference rather than asking yourself what the practical benefit of hiding the existence of the main interpreter would be. The fact is that the main interpreter *is* special (just as the main thread is special), and your wishing that things were otherwise won't magically make it so.

...

I'm mostly just worried about the `get_main()` function. Maybe it should be called `asdfjaosjnoijb()`, so people wouldn't use it. Can't the first running interpreter just introduce itself to its children? And if that's too much to ask, maybe there could be a `get_parent()` function, which would give you the interpreter that spawned the current subinterpreter.

If the embedding application never calls "_Py_ConfigureMainInterpreter", then get_main() could conceivably return None. However, we don't expose that as a public API yet, so for the time being, Py_Initialize() will always call it, and hence there will always be a main interpreter (even in things like mod_wsgi). Whether we invest significant effort in making configuring the main interpreter genuinely optional is still an open question - since most applications are free to just not use the main interpreter for code execution if they don't want to, we haven't found a real world use case that would benefit meaningfully from its non-existence (just as the vast majority of applications don't care about the various ways in which the main thread that runs Py_Initialize() and Py_Finalize() is given special treatment, and for those that do, they're free to avoid using it). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Koos Zevenhoven

13 Sep 13 Sep

5:45 a.m.

On Wed, Sep 13, 2017 at 6:14 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 13 September 2017 at 00:35, Koos Zevenhoven <k7hoven@gmail.com> wrote:>

I don't see how the situation benefits from calling something the "main

...
interpreter". Subinterpreters can be a way to take something non-thread-safe and make it thread-safe, because in an interpreter-per-thread scheme, most of the state, like module globals, are thread-local. (Well, this doesn't help for async concurrency, but anyway.)

"The interpreter that runs __main__" is never going to go away as a concept for the regular CPython CLI.

It's still just *an* interpreter that happens to run __main__. And who says it even needs to be the only one?

...

Right now, its also a restriction even for applications like mod_wsgi, since the GIL state APIs always register C created threads with the main interpreter.

...
...
That's OK - it just means we'll aim to make as many things as possible implicitly subinterpreter-friendly, and for everything else, we'll aim to minimise the adjustments needed to *make* things subinterpreter friendly.

And that's exactly what I'm after here!

No, you're after deliberately making the proposed API non-representative of how the reference implementation actually works because of a personal aesthetic preference rather than asking yourself what the practical benefit of hiding the existence of the main interpreter would be.

The fact is that the main interpreter *is* special (just as the main thread is special), and your wishing that things were otherwise won't magically make it so.

I'm not questioning whether the main interpreter is special, or whether the interpreters may differ from each other. I'm questioning the whole concept of "main interpreter". People should not care about which interpreter is "the main ONE". They should care about what properties an interpreter has. That's not aesthetics. Just look at, e.g. the _decimal/_pydecimal examples in this thread.

...

I'm mostly just worried about the `get_main()` function. Maybe it should be

...
called `asdfjaosjnoijb()`, so people wouldn't use it. Can't the first running interpreter just introduce itself to its children? And if that's too much to ask, maybe there could be a `get_parent()` function, which would give you the interpreter that spawned the current subinterpreter.

If the embedding application never calls "_Py_ConfigureMainInterpreter", then get_main() could conceivably return None. However, we don't expose that as a public API yet, so for the time being, Py_Initialize() will always call it, and hence there will always be a main interpreter (even in things like mod_wsgi).

You don't need to remove _Py_ConfigureMainInterpreter. Just make sure you don't try to smuggle it into the status quo of the possibly upcoming new stdlib module. Who knows what the function does anyway, let alone what it might or might not do in the future. Of course that doesn't mean that there couldn't be ways to configure an interpreter, but coupling that with a concept of a "main interpreter", as you suggest, doesn't seem to make any sense. And surely the code that creates a new interpreter should know if it wants the new interpreter to start with `__name__ == "__main__"` or `__name__ == "__just_any__", if there is a choice. ––Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

Nick Coghlan

5:25 p.m.

On 13 September 2017 at 20:45, Koos Zevenhoven <k7hoven@gmail.com> wrote:

...

On Wed, Sep 13, 2017 at 6:14 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On 13 September 2017 at 00:35, Koos Zevenhoven <k7hoven@gmail.com> wrote:>

...
I don't see how the situation benefits from calling something the "main interpreter". Subinterpreters can be a way to take something non-thread-safe and make it thread-safe, because in an interpreter-per-thread scheme, most of the state, like module globals, are thread-local. (Well, this doesn't help for async concurrency, but anyway.)

"The interpreter that runs __main__" is never going to go away as a concept for the regular CPython CLI.

It's still just *an* interpreter that happens to run __main__. And who says it even needs to be the only one?

Koos, I've asked multiple times now for you to describe the practical user benefits you believe will come from dispensing with the existing notion of a main interpreter (which is *not* something PEP 554 has created - the main interpreter already exists at the implementation level, PEP 554 just makes that fact visible at the Python level). If you can't come up with a meaningful user benefit that would arise from removing it, then please just let the matter drop. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan

14 Sep 14 Sep

5:39 p.m.

On 14 September 2017 at 08:25, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 13 September 2017 at 20:45, Koos Zevenhoven <k7hoven@gmail.com> wrote:

...
It's still just *an* interpreter that happens to run __main__. And who says it even needs to be the only one?

Koos, I've asked multiple times now for you to describe the practical user benefits you believe will come from dispensing with the existing notion of a main interpreter (which is *not* something PEP 554 has created - the main interpreter already exists at the implementation level, PEP 554 just makes that fact visible at the Python level).

Eric addressed this in the latest update, and took the view that since it's a question the can be deferred, it's one that should be deferred, in line with the overall "minimal enabling infrastructure" philosophy of the PEP. On thinking about it further, I believe this may also intersect with some open questions I have around the visibility of *thread* objects across interpreters - the real runtime constraint at the implementation level is the fact that we need a main *thread* in order to sensibly manage the way signal handling works across different platforms, and that's where we may get into trouble if we allow arbitrary subinterpreters to run in the main thread, and accept and process signals directly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Eric Snow

12 Sep 12 Sep

3:51 p.m.

On Sun, Sep 10, 2017 at 7:52 AM, Koos Zevenhoven <k7hoven@gmail.com> wrote:

...

I assume the concept of a main interpreter is inherited from the previous levels of support in the C API, but what exactly is the significance of being "the main interpreter"? Instead, could they just all be subinterpreters of the same Python process (or whatever the right wording would be)?

It might also be helpful if the PEP had a short description of what are considered subinterpreters and how they differ from threads of the same interpreter [*]. Currently, the PEP seems to rely heavily on knowledge of the previously available concepts. However, as this would be a new module, I don't think there's any need to blindly copy the previous design, regardless of how well the design may have served its purpose at the time.

I've updated the PEP to be more instructive. I've also dropped the "get_main()" function from the PEP. -eric

2626

Age (days ago)

2633

Last active (days ago)

List overview

Download

36 comments

11 participants

participants (11)

Antoine Pitrou
Eric Snow
Koos Zevenhoven
Matthew Rocklin
Nathaniel Smith
Nick Coghlan
Paul Moore
Ronald Oussoren
Sebastian Krause
Stefan Krah
Stephan Hoyer

PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

tags

participants (11)