
Thanks for the feedback, Antoine. Sorry for the delay; it's been a busy week for me. I just pushed an updated PEP to the repo. Once I've sorted out the question of passing bytes through channels I plan on posting the PEP to the list again for another round of discussion. In the meantime, I've replied below in-line. -eric On Mon, Sep 18, 2017 at 4:46 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
First my high-level opinion about the PEP: the CSP model can probably be already implemented using Queues. To me, the interesting promise of subinterpreters is if they allow to remove the GIL while sharing memory for big objects (such as Numpy arrays). This means the PEP should probably focus on potential concurrency improvements rather than try to faithfully follow the CSP model.
Please elaborate. I'm interested in understanding what you mean here. Do you have some subinterpreter-based concurrency improvements in mind? What aspect of CSP is the PEP following too faithfully?
``list_all()``::
Return a list of all existing interpreters.
See my naming proposal in the previous thread.
Sorry, your previous comment slipped through the cracks. You suggested: As for the naming, let's make it both unconfusing and explicit? How about three functions: `all_interpreters()`, `running_interpreters()` and `idle_interpreters()`, for example? As to "all_interpreters()", I suppose it's the difference between "interpreters.all_interpreters()" and "interpreters.list_all()". To me the latter looks better. As to "running_interpreters()" and "idle_interpreters()", I'm not sure what the benefit would be. You can compose either list manually with a simple comprehension: [interp for interp in interpreters.list_all() if interp.is_running()] [interp for interp in interpreters.list_all() if not interp.is_running()]
run(source_str, /, **shared):
Run the provided Python source code in the interpreter. Any keyword arguments are added to the interpreter's execution namespace.
"Execution namespace" specifically means the __main__ module in the target interpreter, right?
Right. It's explained in more detail a little further down and elsewhere in the PEP. I've updated the PEP to explicitly mention __main__ here too.
If any of the values are not supported for sharing between interpreters then RuntimeError gets raised. Currently only channels (see "create_channel()" below) are supported.
This may not be called on an already running interpreter. Doing so results in a RuntimeError.
I would distinguish between both error cases: RuntimeError for calling run() on an already running interpreter, ValueError for values which are not supported for sharing.
Good point.
Likewise, if there is any uncaught exception, it propagates into the code where "run()" was called.
That makes it a bit harder to differentiate with errors raised by run() itself (see above), though how much of an annoyance this is remains unclear. The more litigious implication, though, is that it forces the interpreter to support migration of arbitrary objects from one interpreter to another (since a traceback keeps all local variables alive).
Yeah, the proposal to propagate exceptions out of the subinterpreter is still rather weak. I've added some notes the the PEP about this open issue.
The mechanism for passing objects between interpreters is through channels. A channel is a simplex FIFO similar to a pipe. The main difference is that channels can be associated with zero or more interpreters on either end.
So it seems channels have become more complicated now? Is it important to support multi-producer multi-consumer channels?
To me it made the API simpler. The change did introduce the "close()" method, which I suppose could be confusing. However, I'm sure that in practice it won't be. In contrast, the FIFO/pipe-based API that I had before required passing names around, required more calls, required managing the channel/interpreter relationship more carefully, and made it hard to follow that relationship.
Unlike queues, which are also many-to-many, channels have no buffer.
How does it work? Does send() block until someone else calls recv()? That does not sound like a good idea to me.
Correct "send()" blocks until the other end receives (if ever). Likewise "recv()" blocks until the other end sends. This specific behavior is probably the main thing I borrowed from CSP. It is *the* synchronization mechanism. Given the isolated nature of subinterpreters, I consider using this concept from CSP to be a good fit.
I don't think it's a coincidence that the most varied kinds of I/O (from socket or file IO to threading Queues to multiprocessing Pipes) have non-blocking send().
Interestingly, you can set sockets to blocking mode, in which case send() will block until there is room in the kernel buffer. Likewise, queue.Queue.send() supports blocking, in addition to providing a put_nowait() method. Note that the PEP provides "recv_nowait()" and "send_nowait()" (names inspired by queue.Queue), allowing for a non-blocking send. It's just not the default. I deliberated for a little while on which one to make the default. In the end I went with blocking-by-default to stick to the CSP model. However, I want to do what's most practical for users. I can imagine folks at first not expecting blocking send by default. However, it otherwise isn't clear yet which one is better for interpreter channels. I'll add on "open question" about switching to non-blocking-by-default for send().
send() blocking until someone else calls recv() is not only bad for performance,
What is the performance problem?
it also increases the likelihood of deadlocks.
How much of a problem will deadlocks be in practice? (FWIW, CSP provides rigorous guarantees about deadlock detection (which Go leverages), though I'm not sure how much benefit that can offer such a dynamic language as Python.) Regardless, I'll make sure the PEP discusses deadlocks.
EOFError normally means the *other* (sending) side has closed the channel (but it becomes complicated with a multi-producer multi-consumer setup...). When *this* side has closed the channel, we should raise ValueError.
I've fixed this in the PEP.
The Python runtime will garbage collect all closed channels. Note that "close()" is automatically called when it is no longer used in the current interpreter.
"No longer used" meaning it loses all references in this interpreter?
Correct. I've clarified this in the PEP.
Similar remark as above (EOFError vs. ValueError). More generally, send() raising EOFError sounds unheard of.
Hmm. I've fixed this in the PEP, but perhaps using EOFError here (and even for read()) isn't right. I was drawing inspiration from pipes, but certainly the semantics aren't exactly the same. So it may make sense to use something else less I/O-related, like a new exception type in the "interpreters" module. I'll make a note in the PEP about this.
A sidenote: context manager support (__enter__ / __exit__) on channels would sound more useful to me than iteration support.
Yeah, I can see that. FWIW, I've dropped __next__() from the PEP. I've also added a note about added context manager support.
An alternative to support for bytes in channels in support for read-only buffers (the PEP 3119 kind).
Probably you mean PEP 3118.
Yep. :)
Then ``recv()`` would return a memoryview to expose the buffer in a zero-copy way.
It will probably not do much if you only can pass buffers and not structured objects, because unserializing (e.g. unpickling) from a buffer will still copy memory around.
To pass a Numpy array, for example, you not only need to pass its contents but also its metadata (its value type -- named "dtype" --, its shape and strides). This may be serialized as simple tuples of atomic types (str, int, bytes, other tuples), but you want to include a memoryview of the data area somewhere in those tuples.
(and, of course, at some point, this will feel like reinventing pickle :)) but pickle has no mechanism to avoid memory copies, so it can't readily be reused here -- otherwise you're just reinventing multiprocessing...)
I'm still working through all the passing-buffers-through-channels feedback, so I'll defer on a reply for now. :)
timeout arg to pop() and push() -------------------------------
pop() and push() don't exist anymore :-)
Fixed! :)
Synchronization Primitives --------------------------
The ``threading`` module provides a number of synchronization primitives for coordinating concurrent operations. This is especially necessary due to the shared-state nature of threading. In contrast, subinterpreters do not share state. Data sharing is restricted to channels, which do away with the need for explicit synchronization.
I think this rationale confuses Python-level data sharing with process-level data sharing. The main point of subinterpreters (compared to multiprocessing) is that they live in the same OS process. So it's really not true that you can't share a low-level synchronization primitive (say a semaphore) between subinterpreters.
I'm not sure I understand your concern here. Perhaps I used the word "sharing" too ambiguously? By "sharing" I mean that the two actors have read access to something that at least one of them can modify. If they both only have read-only access then it's effectively the same as if they are not sharing. While I can imagine the *possibility* (some day) of an opt-in mechanism to share objects (r/rw or rw/rw), that is definitely not a part of this PEP. I expect that in reality we will only ever pass immutable data between interpreters. So I'm unclear on what need there might be for any synchronization primitives other than what is inherent to channels.
* a ``create()`` arg to indicate resetting ``__main__`` after each ``run`` call * an ``Interpreter.reset_main`` flag to support opting in or out after the fact * an ``Interpreter.reset_main()`` method to opt in when desired
This would all be a false promise. Persistent state lives in other places than __main__ (for example the loaded modules and their respective configurations - think logging or decimal).
I've added a bit more explanation to the PEP to clarify this point.
The main difference between queues and channels is that queues support buffering. This would complicate the blocking semantics of ``recv()`` and ``send()``. Also, queues can be built on top of channels.
But buffering with background threads in pure Python will be order of magnitudes slower than optimized buffering in a custom low-level implementation. It would be a pity if a subinterpreters Queue ended out as slow as a multiprocessing Queue.
I agree. I'm entirely open to supporting other object-passing types, including adding low-level implementations. I've added a note to the PEP to that effect. However, I wanted to start off with the most basic object-passing type, and I felt that channels provides the simplest solution. My goal is to get a basic API landed in 3.7 and then build on it from there for 3.8. That said, in the interest of enabling extra utility in the near-term, I expect that we will be able to design the PyInterpreterState changes (few as they are) in such a way that a C-extension could implement an efficient multi-interpreter Queue type that would run under 3.7. Actually, would that be strictly necessary if you can interact with channels without the GIL in the C-API? Regardless, I'll make a note in the PEP about the relationship between C-API and implementing an efficient multi-interepter Queue. I suppose that means I need to add C-API changes to the PEP (which I had wanted to avoid).