Re: [Python-Dev] PEP 554 v3 (new interpreters module)

23 Sep 2017

      Thanks for the feedback, Antoine.  Sorry for the delay; it's been a
busy week for me.  I just pushed an updated PEP to the repo.  Once
I've sorted out the question of passing bytes through channels I plan
on posting the PEP to the list again for another round of discussion.
In the meantime, I've replied below in-line.

-eric

On Mon, Sep 18, 2017 at 4:46 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
...
First my high-level opinion about the PEP: the CSP model can probably
be already implemented using Queues.  To me, the interesting promise of
subinterpreters is if they allow to remove the GIL while sharing memory
for big objects (such as Numpy arrays).  This means the PEP should
probably focus on potential concurrency improvements rather than try to
faithfully follow the CSP model.
Please elaborate.  I'm interested in understanding what you mean here.
Do you have some subinterpreter-based concurrency improvements in
mind?  What aspect of CSP is the PEP following too faithfully?
...
...
``list_all()``::
Return a list of all existing interpreters.
See my naming proposal in the previous thread.
Sorry, your previous comment slipped through the cracks.  You suggested:

    As for the naming, let's make it both unconfusing and explicit?
    How about three functions: `all_interpreters()`, `running_interpreters()`
    and `idle_interpreters()`, for example?

As to "all_interpreters()", I suppose it's the difference between
"interpreters.all_interpreters()" and "interpreters.list_all()".  To
me the latter looks better.

As to "running_interpreters()" and "idle_interpreters()", I'm not sure
what the benefit would be.  You can compose either list manually with
a simple comprehension:

    [interp for interp in interpreters.list_all() if interp.is_running()]
    [interp for interp in interpreters.list_all() if not interp.is_running()]
...
...
run(source_str, /, **shared):
Run the provided Python source code in the interpreter.  Any
      keyword arguments are added to the interpreter's execution
      namespace.
"Execution namespace" specifically means the __main__ module in the
target interpreter, right?
Right.  It's explained in more detail a little further down and
elsewhere in the PEP.  I've updated the PEP to explicitly mention
__main__ here too.
...
...
If any of the values are not supported for sharing
      between interpreters then RuntimeError gets raised.  Currently
      only channels (see "create_channel()" below) are supported.
This may not be called on an already running interpreter.  Doing
      so results in a RuntimeError.
I would distinguish between both error cases: RuntimeError for calling
run() on an already running interpreter, ValueError for values which
are not supported for sharing.
Good point.
...
...
Likewise, if there is any uncaught
      exception, it propagates into the code where "run()" was called.
That makes it a bit harder to differentiate with errors raised by run()
itself (see above), though how much of an annoyance this is remains
unclear.  The more litigious implication, though, is that it forces the
interpreter to support migration of arbitrary objects from one
interpreter to another (since a traceback keeps all local variables
alive).
Yeah, the proposal to propagate exceptions out of the subinterpreter
is still rather weak.  I've added some notes the the PEP about this
open issue.
...
...
The mechanism for passing objects between interpreters is through
channels.  A channel is a simplex FIFO similar to a pipe.  The main
difference is that channels can be associated with zero or more
interpreters on either end.
So it seems channels have become more complicated now?  Is it important
to support multi-producer multi-consumer channels?
To me it made the API simpler.  The change did introduce the "close()"
method, which I suppose could be confusing.  However, I'm sure that in
practice it won't be.  In contrast, the FIFO/pipe-based API that I had
before required passing names around, required more calls, required
managing the channel/interpreter relationship more carefully, and made
it hard to follow that relationship.
...
...
Unlike queues, which are also many-to-many,
channels have no buffer.
How does it work?  Does send() block until someone else calls recv()?
That does not sound like a good idea to me.
Correct "send()" blocks until the other end receives (if ever).
Likewise "recv()" blocks until the other end sends.  This specific
behavior is probably the main thing I borrowed from CSP.  It is *the*
synchronization mechanism.  Given the isolated nature of
subinterpreters, I consider using this concept from CSP to be a good
fit.
...
I don't think it's a
coincidence that the most varied kinds of I/O (from socket or file IO
to threading Queues to multiprocessing Pipes) have non-blocking send().
Interestingly, you can set sockets to blocking mode, in which case
send() will block until there is room in the kernel buffer.  Likewise,
queue.Queue.send() supports blocking, in addition to providing a
put_nowait() method.

Note that the PEP provides "recv_nowait()" and "send_nowait()" (names
inspired by queue.Queue), allowing for a non-blocking send.  It's just
not the default.  I deliberated for a little while on which one to
make the default.

In the end I went with blocking-by-default to stick to the CSP model.
However, I want to do what's most practical for users.  I can imagine
folks at first not expecting blocking send by default.  However, it
otherwise isn't clear yet which one is better for interpreter
channels.  I'll add on "open question" about switching to
non-blocking-by-default for send().
...
send() blocking until someone else calls recv() is not only bad for
performance,
What is the performance problem?
...
it also increases the likelihood of deadlocks.
How much of a problem will deadlocks be in practice?  (FWIW, CSP
provides rigorous guarantees about deadlock detection (which Go
leverages), though I'm not sure how much benefit that can offer such a
dynamic language as Python.)  Regardless, I'll make sure the PEP
discusses deadlocks.
...
EOFError normally means the *other* (sending) side has closed the
channel (but it becomes complicated with a multi-producer multi-consumer
setup...). When *this* side has closed the channel, we should raise
ValueError.
I've fixed this in the PEP.
...
...
The Python runtime
      will garbage collect all closed channels.  Note that "close()" is
      automatically called when it is no longer used in the current
      interpreter.
"No longer used" meaning it loses all references in this interpreter?
Correct.  I've clarified this in the PEP.
...
Similar remark as above (EOFError vs. ValueError).
More generally, send() raising EOFError sounds unheard of.
Hmm.  I've fixed this in the PEP, but perhaps using EOFError here (and
even for read()) isn't right.  I was drawing inspiration from pipes,
but certainly the semantics aren't exactly the same.  So it may make
sense to use something else less I/O-related, like a new exception
type in the "interpreters" module.  I'll make a note in the PEP about
this.
...
A sidenote: context manager support (__enter__ / __exit__) on channels
would sound more useful to me than iteration support.
Yeah, I can see that.  FWIW, I've dropped __next__() from the PEP.
I've also added a note about added context manager support.
...
...
An alternative to support for bytes in channels in support for
read-only buffers (the PEP 3119 kind).
Probably you mean PEP 3118.
Yep. :)
...
...
Then ``recv()`` would return
a memoryview to expose the buffer in a zero-copy way.
It will probably not do much if you only can pass buffers and not
structured objects, because unserializing (e.g. unpickling) from a
buffer will still copy memory around.
To pass a Numpy array, for example, you not only need to pass its
contents but also its metadata (its value type -- named "dtype" --, its
shape and strides).  This may be serialized as simple tuples of atomic
types (str, int, bytes, other tuples), but you want to include a
memoryview of the data area somewhere in those tuples.
(and, of course, at some point, this will feel like reinventing
pickle :)) but pickle has no mechanism to avoid memory copies, so it
can't readily be reused here -- otherwise you're just reinventing
multiprocessing...)
I'm still working through all the passing-buffers-through-channels
feedback, so I'll defer on a reply for now. :)
...
...
timeout arg to pop() and push()
-------------------------------
pop() and push() don't exist anymore :-)
Fixed! :)
...
...
Synchronization Primitives
--------------------------
The ``threading`` module provides a number of synchronization primitives
for coordinating concurrent operations.  This is especially necessary
due to the shared-state nature of threading.  In contrast,
subinterpreters do not share state.  Data sharing is restricted to
channels, which do away with the need for explicit synchronization.
I think this rationale confuses Python-level data sharing with
process-level data sharing.  The main point of subinterpreters
(compared to multiprocessing) is that they live in the same OS
process.  So it's really not true that you can't share a low-level
synchronization primitive (say a semaphore) between subinterpreters.
I'm not sure I understand your concern here.  Perhaps I used the word
"sharing" too ambiguously?  By "sharing" I mean that the two actors
have read access to something that at least one of them can modify.
If they both only have read-only access then it's effectively the same
as if they are not sharing.

While I can imagine the *possibility* (some day) of an opt-in
mechanism to share objects (r/rw or rw/rw), that is definitely not a
part of this PEP.  I expect that in reality we will only ever pass
immutable data between interpreters.  So I'm unclear on what need
there might be for any synchronization primitives other than what is
inherent to channels.
...
...
* a ``create()`` arg to indicate resetting ``__main__`` after each
  ``run`` call
* an ``Interpreter.reset_main`` flag to support opting in or out
  after the fact
* an ``Interpreter.reset_main()`` method to opt in when desired
This would all be a false promise.  Persistent state lives in other
places than __main__ (for example the loaded modules and their
respective configurations - think logging or decimal).
I've added a bit more explanation to the PEP to clarify this point.
...
...
The main difference between queues and channels is that queues support
buffering.  This would complicate the blocking semantics of ``recv()``
and ``send()``.  Also, queues can be built on top of channels.
But buffering with background threads in pure Python will be order
of magnitudes slower than optimized buffering in a custom low-level
implementation.  It would be a pity if a subinterpreters Queue ended
out as slow as a multiprocessing Queue.
I agree.  I'm entirely open to supporting other object-passing types,
including adding low-level implementations.  I've added a note to the
PEP to that effect.

However, I wanted to start off with the most basic object-passing
type, and I felt that channels provides the simplest solution.  My
goal is to get a basic API landed in 3.7 and then build on it from
there for 3.8.

That said, in the interest of enabling extra utility in the near-term,
I expect that we will be able to design the PyInterpreterState changes
(few as they are) in such a way that a C-extension could implement an
efficient multi-interpreter Queue type that would run under 3.7.
Actually, would that be strictly necessary if you can interact with
channels without the GIL in the C-API?  Regardless, I'll make a note
in the PEP about the relationship between C-API and implementing an
efficient multi-interepter Queue.  I suppose that means I need to add
C-API changes to the PEP (which I had wanted to avoid).

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

Eric Snow