[Python-Dev] PEP 554 v3 (new interpreters module)

Nick Coghlan ncoghlan at gmail.com
Thu Oct 5 23:38:23 EDT 2017

On 6 October 2017 at 11:48, Eric Snow <ericsnowcurrently at gmail.com> wrote:

> > And that's the real pay-off that comes from defining this in terms of the
> > memoryview protocol: Py_buffer structs *aren't* Python objects, so it's
> only
> > a regular C struct that gets passed across the interpreter boundary (the
> > reference to the original objects gets carried along passively as part of
> > the CIV - it never gets *used* in the receiving interpreter).
> Yeah, the (PEP 3118) buffer protocol offers precedent in a number of
> ways that are applicable to channels here.  I'm simply reticent to
> lock PEP 554 into such a specific solution as the buffer-specific CIV.
> I'm trying to accommodate anticipated future needs while keeping the
> PEP as simple and basic as possible.  It's driving me nuts! :P  Things
> were *much* simpler before I added Channels to the PEP. :)

Starting with memory-sharing only doesn't lock us into anything, since you
can still add a more flexible kind of channel based on a different protocol
later if it turns out that memory sharing isn't enough.

By contrast, if you make the initial channel semantics incompatible with
multiprocessing by design, you *will* prevent anyone from experimenting
with replicating the shared memory based channel API for communicating
between processes :)

That said, if you'd prefer to keep the "Channel" name available for the
possible introduction of object channels at a later date, you could call
the initial memoryview based channel a "MemChannel".

> > I don't think we should be touching the behaviour of core builtins
> solely to
> > enable message passing to subinterpreters without a shared GIL.
> Keep in mind that I included the above as a possible solution using
> tp_share() that would work *after* we stop sharing the GIL.  My point
> is that with tp_share() we have a solution that works now *and* will
> work later.  I don't care how we use tp_share to do so. :)  I long to
> be able to say in the PEP that you can pass bytes through the channel
> and get bytes on the other side.

Memory views are a builtin type as well, and they emphasise the practical
benefit we're trying to get relative to typical multiprocessing
arranagements: zero-copy data sharing.

So here's my proposed experimentation-enabling development strategy:

1. Start out with a MemChannel API, that accepts any buffer-exporting
object as input, and outputs only a cross-interpreter memoryview subclass
2. Use that as the basis for the work to get to a per-interpreter locking
arrangement that allows subinterpreters to fully exploit multiple CPUs
3. Only then try to design a Channel API that allows for sharing builtin
immutable objects between interpreters (bytes, strings, numbers), at a time
when you can be certain you won't be inadvertently making it harder to make
the GIL a truly per-interpreter lock, rather than the current process
global runtime lock.

The key benefit of this approach is that we *know* MemChannel can work: the
buffer protocol already operates at the level of C structs and pointers,
not Python objects, and there are already plenty of interesting
buffer-protocol-supporting objects around, so as long as the CIV switches
interpreters at the right time, there aren't any fundamentally new runtime
level capabilities needed to implement it.

The lower level MemChannel API could then also be replicated for
multiprocessing, while the higher level more speculative object-based
Channel API would be specific to subinterpreters (and probably only ever
designed and implemented if you first succeed in making subinterpreters
sufficiently independent that they don't rely on a process-wide GIL any

So I'm not saying "Never design an object-sharing protocol specifically for
use with subinterpreters". I'm saying "You don't have a demonstrated need
for that yet, so don't try to define it until you do".

> My mind is drawn to the comparison between that and the question of
> CIV vs. tp_share().  CIV would be more like the post-451 import world,
> where I expect the CIV would take care of the data sharing operations.
> That said, the situation in PEP 554 is sufficiently different that I'm
> not convinced a generic CIV protocol would be better.  I'm not sure
> how much CIV could do for you over helpers+tp_share.
> Anyway, here are the leading approaches that I'm looking at now:
> * adding a tp_share slot
>   + you send() the object directly and recv() the object coming out of
> tp_share()
>      (which will probably be the same type as the original)
>   + this would eventually require small changes in tp_free for
> participating types
>   + we would likely provide helpers (eventually), similar to the new
> buffer protocol,
>      to make it easier to manage sharing data

I'm skeptical about this approach because you'll be designing in a vacuum
against future possible constraints that you can't test yet: the inherent
complexity in the object sharing protocol will come from *not* having a
process-wide GIL, but you'll be starting out with a process-wide GIL still
in place. And that means third parties will inevitably rely on the
process-wide GIL in their tp_share implementations (despite their best
intentions), and you'll end up with the same issue that causes problems for
the rest of the C API.

By contrast, if you delay this step until *after* the GIL has successfully
been shifted to being per-interpreter, then by the time the new protocol is
defined, people will also be able to test their tp_share implementations

At that point, you'd also presumably have evidence of demand to justify the
introduction of a new core language protocol, as:

* folks will only complain about the limitations of MemChannel if they're
actually using subinterpreters
* the complaints about the limitations of MemChannel would help guide the
object sharing protocol design

> * simulating tp_share via an external global registry (or a registry
> on the Channel type)
>   + it would still be hard to make work without hooking into tp_free()
> * CIVs hard-coded in Channel (or BufferViewChannel, etc.) for specific
> types (e.g. buffers)
>   + you send() the object like normal, but recv() the view
> * a CIV protocol on Channel by which you can add support for more types
>   + you send() the object like normal but recv() the view
>   + could work through subclassing or a registry
>   + a lot of conceptual similarity with tp_share+tp_free
> * a CIV-like proxy
>   + you wrap the object, send() the proxy, and recv() a proxy
>   + this is entirely compatible with tp_share()

* Allow for multiple channel types, such that MemChannel is merely the
*first* channel type, rather than the *only* channel type
  + Allows PEP 554 to be restricted to things we already know can be made
to work
  + Doesn't block the introduction of an object-sharing based Channel in
some future release
  + Allows for at least some channel types to be adapted for use with
shared memory and multiprocessing

> Here are what I consider the key metrics relative to the utility of a
> solution (not in any significant order):
> * how hard to understand as a Python programmer?

Not especially important yet - this is more a criterion for the final API,
not the initial experimental platform.

> * how much extra work (if any) for folks calling Channel.send()?
> * how much extra work (if any) for folks calling Channel.recv()?

I don't think either are particularly important yet, although we also don't
want to raise any pointless barriers to experimentation.

> * how complex is the CPython implementation?

This is critical, since we want to minimise any potential for undesirable
side effects on regular single interpreter code.

> * how hard to understand as a type author (wanting to add support for
> their type)?
> * how hard to add support for a new type?
> * what variety of types could be supported?
> * what breadth of experimentation opens up?

You missed the big one: what risk does the initial channel design pose to
the underlying objective of making the GIL a genuinely per-interpreter lock?

If we don't eventually reach the latter goal, then subinterpreters won't
really offer much in the way of compelling benefits over just using a
thread pool and queue.Queue.

MemChannel poses zero additional risk to that, since we wouldn't be sharing
actual Python objects between interpreters, only C pointers and structs.

By contrast, introducing an object channel early poses significant new
risks to that goal, since it will force you to solve hard protocol design
and refcount management problems *before* making the switch, rather than
being able to defer the design of the object channel protocol until *after*
you've already enabled the ability to run subinterpreters in completely
independent threads.

> The most important thing to me is keeping things simple for Python
> programmers.  After that is ease-of-use for type authors.  However, I
> also want to put us in a good position in 3.7 to experiment
> extensively with subinterpreters, so that's a big consideration.
> Consequently, for PEP 554 my goal is to find a solution for object
> sharing that keeps things simple in Python while laying a basic
> foundation we can build on at the C level, so we don't get locked in
> but still maximize our opportunities to experiment. :)

I think our priorities are quite different then, as I believe PEP 554
should be focused on defining a relatively easy to implement API that
nevertheless makes it possible to write interesting programs while working
on the goal of making the GIL per-interpreter, without worrying too much
about whether or not the initial cross-interpreter communication channels
closely resemble the final ones that will be intended for more general use.


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20171006/1c87c9e9/attachment.html>

More information about the Python-Dev mailing list