[Python-Dev] PEP 554 v3 (new interpreters module)

Koos Zevenhoven k7hoven at gmail.com
Fri Oct 6 12:29:24 EDT 2017


While I'm actually trying not to say much here so that I can avoid this
discussion now, here's just a couple of ideas and thoughts from me at this
point:

(A)
Instead of sending bytes and receiving memoryviews, one could consider
sending *and* receiving memoryviews for now. That could then be extended
into more types of objects in the future without changing the basic concept
of the channel. Probably, the memoryview would need to be copied (but not
the data of course). But I'm guessing copying a memoryview would be quite
fast.

This would hopefully require less API changes or additions in the future.
OTOH, giving it a different name like MemChannel or making it 3rd party
will buy some more time to figure out the right API. But maybe that's not
needed.

(B)
We would probably then like to pretend that the object coming out the other
end of a Channel *is* the original object. As long as these channels are
the only way to directly pass objects between interpreters, there are
essentially only two ways to tell the difference (AFAICT):

1. Calling id(...) and sending it over to the other interpreter and
checking if it's the same.

2. When the same object is sent twice to the same interpreter. Then one can
compare the two with id(...) or using the `is` operator.

There are solutions to the problems too:

1. Send the id() from the sending interpreter along with the sent object so
that the receiving interpreter can somehow attach it to the object and then
return it from id(...).

2. When an object is received, make a lookup in an interpreter-wide cache
to see if an object by this id has already been received. If yes, take that
one.

Now it should essentially look like the received object is really "the same
one" as in the sending interpreter. This should also work with multiple
interpreters and multiple channels, as long as the id is always preserved.

(C)
One further complication regarding memoryview in general is that .release()
should probably be propagated to the sending interpreter somehow.

(D)
I think someone already mentioned this one, but would it not be better to
start a new interpreter in the background in a new thread by default? I
think this would make things simpler and leave more freedom regarding the
implementation in the future. If you need to run an interpreter within the
current thread, you could perhaps optionally do that too.


––Koos


PS. I have lots of thoughts related to this, but I can't afford to engage
in them now. (Anyway, it's probably more urgent to get some stuff with PEP
555 and its spin-off thoughts out of the way).



On Fri, Oct 6, 2017 at 6:38 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 6 October 2017 at 11:48, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>
>> > And that's the real pay-off that comes from defining this in terms of
>> the
>> > memoryview protocol: Py_buffer structs *aren't* Python objects, so it's
>> only
>> > a regular C struct that gets passed across the interpreter boundary (the
>> > reference to the original objects gets carried along passively as part
>> of
>> > the CIV - it never gets *used* in the receiving interpreter).
>>
>> Yeah, the (PEP 3118) buffer protocol offers precedent in a number of
>> ways that are applicable to channels here.  I'm simply reticent to
>> lock PEP 554 into such a specific solution as the buffer-specific CIV.
>> I'm trying to accommodate anticipated future needs while keeping the
>> PEP as simple and basic as possible.  It's driving me nuts! :P  Things
>> were *much* simpler before I added Channels to the PEP. :)
>>
>
> Starting with memory-sharing only doesn't lock us into anything, since you
> can still add a more flexible kind of channel based on a different protocol
> later if it turns out that memory sharing isn't enough.
>
> By contrast, if you make the initial channel semantics incompatible with
> multiprocessing by design, you *will* prevent anyone from experimenting
> with replicating the shared memory based channel API for communicating
> between processes :)
>
> That said, if you'd prefer to keep the "Channel" name available for the
> possible introduction of object channels at a later date, you could call
> the initial memoryview based channel a "MemChannel".
>
>
>> > I don't think we should be touching the behaviour of core builtins
>> solely to
>> > enable message passing to subinterpreters without a shared GIL.
>>
>> Keep in mind that I included the above as a possible solution using
>> tp_share() that would work *after* we stop sharing the GIL.  My point
>> is that with tp_share() we have a solution that works now *and* will
>> work later.  I don't care how we use tp_share to do so. :)  I long to
>> be able to say in the PEP that you can pass bytes through the channel
>> and get bytes on the other side.
>>
>
> Memory views are a builtin type as well, and they emphasise the practical
> benefit we're trying to get relative to typical multiprocessing
> arranagements: zero-copy data sharing.
>
> So here's my proposed experimentation-enabling development strategy:
>
> 1. Start out with a MemChannel API, that accepts any buffer-exporting
> object as input, and outputs only a cross-interpreter memoryview subclass
> 2. Use that as the basis for the work to get to a per-interpreter locking
> arrangement that allows subinterpreters to fully exploit multiple CPUs
> 3. Only then try to design a Channel API that allows for sharing builtin
> immutable objects between interpreters (bytes, strings, numbers), at a time
> when you can be certain you won't be inadvertently making it harder to make
> the GIL a truly per-interpreter lock, rather than the current process
> global runtime lock.
>
> The key benefit of this approach is that we *know* MemChannel can work:
> the buffer protocol already operates at the level of C structs and
> pointers, not Python objects, and there are already plenty of interesting
> buffer-protocol-supporting objects around, so as long as the CIV switches
> interpreters at the right time, there aren't any fundamentally new runtime
> level capabilities needed to implement it.
>
> The lower level MemChannel API could then also be replicated for
> multiprocessing, while the higher level more speculative object-based
> Channel API would be specific to subinterpreters (and probably only ever
> designed and implemented if you first succeed in making subinterpreters
> sufficiently independent that they don't rely on a process-wide GIL any
> more).
>
> So I'm not saying "Never design an object-sharing protocol specifically
> for use with subinterpreters". I'm saying "You don't have a demonstrated
> need for that yet, so don't try to define it until you do".
>
>
>
>> My mind is drawn to the comparison between that and the question of
>> CIV vs. tp_share().  CIV would be more like the post-451 import world,
>> where I expect the CIV would take care of the data sharing operations.
>> That said, the situation in PEP 554 is sufficiently different that I'm
>> not convinced a generic CIV protocol would be better.  I'm not sure
>> how much CIV could do for you over helpers+tp_share.
>>
>> Anyway, here are the leading approaches that I'm looking at now:
>>
>> * adding a tp_share slot
>>   + you send() the object directly and recv() the object coming out of
>> tp_share()
>>      (which will probably be the same type as the original)
>>   + this would eventually require small changes in tp_free for
>> participating types
>>   + we would likely provide helpers (eventually), similar to the new
>> buffer protocol,
>>      to make it easier to manage sharing data
>>
>
> I'm skeptical about this approach because you'll be designing in a vacuum
> against future possible constraints that you can't test yet: the inherent
> complexity in the object sharing protocol will come from *not* having a
> process-wide GIL, but you'll be starting out with a process-wide GIL still
> in place. And that means third parties will inevitably rely on the
> process-wide GIL in their tp_share implementations (despite their best
> intentions), and you'll end up with the same issue that causes problems for
> the rest of the C API.
>
> By contrast, if you delay this step until *after* the GIL has successfully
> been shifted to being per-interpreter, then by the time the new protocol is
> defined, people will also be able to test their tp_share implementations
> properly.
>
> At that point, you'd also presumably have evidence of demand to justify
> the introduction of a new core language protocol, as:
>
> * folks will only complain about the limitations of MemChannel if they're
> actually using subinterpreters
> * the complaints about the limitations of MemChannel would help guide the
> object sharing protocol design
>
>
>> * simulating tp_share via an external global registry (or a registry
>> on the Channel type)
>>   + it would still be hard to make work without hooking into tp_free()
>> * CIVs hard-coded in Channel (or BufferViewChannel, etc.) for specific
>> types (e.g. buffers)
>>   + you send() the object like normal, but recv() the view
>> * a CIV protocol on Channel by which you can add support for more types
>>   + you send() the object like normal but recv() the view
>>   + could work through subclassing or a registry
>>   + a lot of conceptual similarity with tp_share+tp_free
>> * a CIV-like proxy
>>   + you wrap the object, send() the proxy, and recv() a proxy
>>   + this is entirely compatible with tp_share()
>>
>
> * Allow for multiple channel types, such that MemChannel is merely the
> *first* channel type, rather than the *only* channel type
>   + Allows PEP 554 to be restricted to things we already know can be made
> to work
>   + Doesn't block the introduction of an object-sharing based Channel in
> some future release
>   + Allows for at least some channel types to be adapted for use with
> shared memory and multiprocessing
>
>
>> Here are what I consider the key metrics relative to the utility of a
>> solution (not in any significant order):
>>
>> * how hard to understand as a Python programmer?
>>
>
> Not especially important yet - this is more a criterion for the final API,
> not the initial experimental platform.
>
>
>> * how much extra work (if any) for folks calling Channel.send()?
>> * how much extra work (if any) for folks calling Channel.recv()?
>>
>
> I don't think either are particularly important yet, although we also
> don't want to raise any pointless barriers to experimentation.
>
>
>> * how complex is the CPython implementation?
>>
>
> This is critical, since we want to minimise any potential for undesirable
> side effects on regular single interpreter code.
>
>
>> * how hard to understand as a type author (wanting to add support for
>> their type)?
>> * how hard to add support for a new type?
>> * what variety of types could be supported?
>> * what breadth of experimentation opens up?
>>
>
> You missed the big one: what risk does the initial channel design pose to
> the underlying objective of making the GIL a genuinely per-interpreter lock?
>
> If we don't eventually reach the latter goal, then subinterpreters won't
> really offer much in the way of compelling benefits over just using a
> thread pool and queue.Queue.
>
> MemChannel poses zero additional risk to that, since we wouldn't be
> sharing actual Python objects between interpreters, only C pointers and
> structs.
>
> By contrast, introducing an object channel early poses significant new
> risks to that goal, since it will force you to solve hard protocol design
> and refcount management problems *before* making the switch, rather than
> being able to defer the design of the object channel protocol until *after*
> you've already enabled the ability to run subinterpreters in completely
> independent threads.
>
>
>> The most important thing to me is keeping things simple for Python
>> programmers.  After that is ease-of-use for type authors.  However, I
>> also want to put us in a good position in 3.7 to experiment
>> extensively with subinterpreters, so that's a big consideration.
>>
>> Consequently, for PEP 554 my goal is to find a solution for object
>> sharing that keeps things simple in Python while laying a basic
>> foundation we can build on at the C level, so we don't get locked in
>> but still maximize our opportunities to experiment. :)
>>
>
> I think our priorities are quite different then, as I believe PEP 554
> should be focused on defining a relatively easy to implement API that
> nevertheless makes it possible to write interesting programs while working
> on the goal of making the GIL per-interpreter, without worrying too much
> about whether or not the initial cross-interpreter communication channels
> closely resemble the final ones that will be intended for more general use.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> k7hoven%40gmail.com
>
>


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20171006/363f48bc/attachment-0001.html>


More information about the Python-Dev mailing list