Re: [Python-Dev] PEP 554 v3 (new interpreters module)

Sept. 14, 2017

      On 15 September 2017 at 12:04, Nathaniel Smith <njs@pobox.com> wrote:
...
On Thu, Sep 14, 2017 at 5:44 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
...
The reason we're OK with this is that it means that only reading a new
message from a channel (i.e creating a cross-interpreter view) or
discarding a previously read message (i.e. closing a cross-interpreter
view) will be synchronisation points where the receiving interpreter
necessarily needs to acquire the sending interpreter's GIL.
By contrast, if we allow an actual bytes object to be shared, then
either every INCREF or DECREF on that bytes object becomes a
synchronisation point, or else we end up needing some kind of
secondary per-interpreter refcount where the interpreter doesn't drop
its shared reference to the original object in its source interpreter
until the internal refcount in the borrowing interpreter drops to
zero.
Ah, that makes more sense.
I am nervous that allowing arbitrary memoryviews gives a *little* more
power than we need or want. I like that the current API can reasonably
be emulated using subprocesses -- it opens up the door for backports,
compatibility support on language implementations that don't support
subinterpreters, direct benchmark comparisons between the two
implementation strategies, etc. But if we allow arbitrary memoryviews,
then this requires that you can take (a) an arbitrary object, not
specified ahead of time, and (b) provide two read-write views on it in
separate interpreters such that modifications made in one are
immediately visible in the other. Subprocesses can do one or the other
-- they can copy arbitrary data, and if you warn them ahead of time
when you allocate the buffer, they can do real zero-copy shared
memory. But the combination is really difficult.
One constraint we'd want to impose is that the memory view in the
receiving interpreter should always be read-only - while we don't
currently expose the ability to request that at the Python layer,
memoryviews *do* support the creation of read-only views at the C API
layer (which then gets reported to Python code via the "view.readonly"
attribute).

While that change alone is enough to preserve the simplex nature of
the channel, it wouldn't be enough to prevent the *sender* from
mutating the buffer contents and having that change be visible in the
recipient.

In that regard it may make sense to maintain both restrictions
initially (as you suggested below): only accept bytes on the sending
side (to prevent mutation by the sender), and expose that as a
read-only memory view on the receiving side (to allow for zero-copy
data sharing without allowing mutation by the receiver).
...
It'd be one thing if this were like a key feature that gave
subinterpreters an advantage over subprocesses, but it seems really
unlikely to me that a library won't know ahead of time when it's
filling in a buffer to be transferred, and if anything it seems like
we'd rather not expose read-write shared mappings in any case. It's
extremely non-trivial to do right [1].
tl;dr: let's not rule out a useful implementation strategy based on a
feature we don't actually need.
Yeah, the description Eric currently has in the PEP is a summary of a
much longer suggestion Yury, Neil Schumenauer and I put together while
waiting for our flights following the core dev sprint, and the full
version had some of these additional constraints on it (most notably
the "read-only in the receiving interpreter" one).
...
One alternative would be your option (3) -- you can put bytes in and
get memoryviews out, and since bytes objects are immutable it's OK.
Indeed, I think that will be a sensible starting point. However, I
genuinely want to allow for zero-copy sharing of NumPy arrays
eventually, as that's where I think this idea gets most interesting:
the potential to allow for multiple parallel read operations on a
given NumPy array *in Python* (rather than Cython or C) without
running afoul of the GIL, and without needing to mess about with the
complexities of operating system level IPC.
...
...
...
...
Handling an exception
That way channels can be a namespace *specifically* for passing in
channels, and can be reported as such on RunResult. If we decide to
allow arbitrary shared objects in the future, or add flag options like
"reraise=True" to reraise exceptions from the subinterpreter in the
current interpreter, we'd have that ability, rather than having the
entire potential keyword namespace taken up for passing shared
objects.
Would channels be a dict, or...?
Yeah, it would be a direct replacement for the way the current draft
is proposing to use the keywords dict - it would just be a separate
dictionary instead.

It does occur to me that if we wanted to align with the way the
`runpy` module spells that concept, we'd call the option
`init_globals`, but I'm thinking it will be better to only allow
channels to be passed through directly, and require that everything
else be sent through a channel.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia