
Hi, First my high-level opinion about the PEP: the CSP model can probably be already implemented using Queues. To me, the interesting promise of subinterpreters is if they allow to remove the GIL while sharing memory for big objects (such as Numpy arrays). This means the PEP should probably focus on potential concurrency improvements rather than try to faithfully follow the CSP model. Other than that, a bunch of detailed comments follow: On Wed, 13 Sep 2017 18:44:31 -0700 Eric Snow <ericsnowcurrently@gmail.com> wrote:
API for interpreters --------------------
The module provides the following functions:
``list_all()``::
Return a list of all existing interpreters.
See my naming proposal in the previous thread.
run(source_str, /, **shared):
Run the provided Python source code in the interpreter. Any keyword arguments are added to the interpreter's execution namespace.
"Execution namespace" specifically means the __main__ module in the target interpreter, right?
If any of the values are not supported for sharing between interpreters then RuntimeError gets raised. Currently only channels (see "create_channel()" below) are supported.
This may not be called on an already running interpreter. Doing so results in a RuntimeError.
I would distinguish between both error cases: RuntimeError for calling run() on an already running interpreter, ValueError for values which are not supported for sharing.
Likewise, if there is any uncaught exception, it propagates into the code where "run()" was called.
That makes it a bit harder to differentiate with errors raised by run() itself (see above), though how much of an annoyance this is remains unclear. The more litigious implication, though, is that it forces the interpreter to support migration of arbitrary objects from one interpreter to another (since a traceback keeps all local variables alive).
API for sharing data --------------------
The mechanism for passing objects between interpreters is through channels. A channel is a simplex FIFO similar to a pipe. The main difference is that channels can be associated with zero or more interpreters on either end.
So it seems channels have become more complicated now? Is it important to support multi-producer multi-consumer channels?
Unlike queues, which are also many-to-many, channels have no buffer.
How does it work? Does send() block until someone else calls recv()? That does not sound like a good idea to me. I don't think it's a coincidence that the most varied kinds of I/O (from socket or file IO to threading Queues to multiprocessing Pipes) have non-blocking send(). send() blocking until someone else calls recv() is not only bad for performance, it also increases the likelihood of deadlocks.
recv_nowait(default=None):
Return the next object from the channel. If none have been sent then return the default. If the channel has been closed then EOFError is raised.
close():
No longer associate the current interpreter with the channel (on the receiving end). This is a noop if the interpreter isn't already associated. Once an interpreter is no longer associated with the channel, subsequent (or current) send() and recv() calls from that interpreter will raise EOFError.
EOFError normally means the *other* (sending) side has closed the channel (but it becomes complicated with a multi-producer multi-consumer setup...). When *this* side has closed the channel, we should raise ValueError.
The Python runtime will garbage collect all closed channels. Note that "close()" is automatically called when it is no longer used in the current interpreter.
"No longer used" meaning it loses all references in this interpreter?
send(obj):
Send the object to the receiving end of the channel. Wait until the object is received. If the channel does not support the object then TypeError is raised. Currently only bytes are supported. If the channel has been closed then EOFError is raised.
Similar remark as above (EOFError vs. ValueError). More generally, send() raising EOFError sounds unheard of. A sidenote: context manager support (__enter__ / __exit__) on channels would sound more useful to me than iteration support.
Initial support for buffers in channels ---------------------------------------
An alternative to support for bytes in channels in support for read-only buffers (the PEP 3119 kind).
Probably you mean PEP 3118.
Then ``recv()`` would return a memoryview to expose the buffer in a zero-copy way.
It will probably not do much if you only can pass buffers and not structured objects, because unserializing (e.g. unpickling) from a buffer will still copy memory around. To pass a Numpy array, for example, you not only need to pass its contents but also its metadata (its value type -- named "dtype" --, its shape and strides). This may be serialized as simple tuples of atomic types (str, int, bytes, other tuples), but you want to include a memoryview of the data area somewhere in those tuples. (and, of course, at some point, this will feel like reinventing pickle :)) but pickle has no mechanism to avoid memory copies, so it can't readily be reused here -- otherwise you're just reinventing multiprocessing...)
timeout arg to pop() and push() -------------------------------
pop() and push() don't exist anymore :-)
Synchronization Primitives --------------------------
The ``threading`` module provides a number of synchronization primitives for coordinating concurrent operations. This is especially necessary due to the shared-state nature of threading. In contrast, subinterpreters do not share state. Data sharing is restricted to channels, which do away with the need for explicit synchronization.
I think this rationale confuses Python-level data sharing with process-level data sharing. The main point of subinterpreters (compared to multiprocessing) is that they live in the same OS process. So it's really not true that you can't share a low-level synchronization primitive (say a semaphore) between subinterpreters. (also see multiprocessing/synchronize.py, which implements all synchronization primitives using basic low-level semaphores)
Solutions include:
* a ``create()`` arg to indicate resetting ``__main__`` after each ``run`` call * an ``Interpreter.reset_main`` flag to support opting in or out after the fact * an ``Interpreter.reset_main()`` method to opt in when desired
This would all be a false promise. Persistent state lives in other places than __main__ (for example the loaded modules and their respective configurations - think logging or decimal).
Use queues instead of channels ------------------------------
The main difference between queues and channels is that queues support buffering. This would complicate the blocking semantics of ``recv()`` and ``send()``. Also, queues can be built on top of channels.
But buffering with background threads in pure Python will be order of magnitudes slower than optimized buffering in a custom low-level implementation. It would be a pity if a subinterpreters Queue ended out as slow as a multiprocessing Queue. Regards Antoine.