On Thu, Oct 5, 2017 at 4:57 AM, Nick Coghlan firstname.lastname@example.org wrote:
This would be hard to get to work reliably, because "orig.tp_share()" would be running in the receiving interpreter, but all the attributes of "orig" would have been allocated by the sending interpreter. It gets more reliable if it's *Channel.send* that calls tp_share() though, but moving the call to the sending side makes it clear that a tp_share protocol would still need to rely on a more primitive set of "shareable objects" that were the permitted return values from the tp_share call.
The point of running tp_share() in the receiving interpreter is to force allocation under that interpreter, so that GC applies there. I agree that you basically can't do anything in tp_share() that would affect the sending interpreter, including INCREF and DECREF. Since we INCREFed in send(), we know that the we have a safe reference, so we don't have to worry about that part in tp_share(). We would only be able to do low-level things (like the buffer protocol) that don't interact with the original object's interpreter.
Given that this is a quite low-level tp slot and low-level functionality, I'd expect that a sufficiently clear entry (i.e. warning) in the docs would be enough for the few that dare. <wink>
From my perspective adding the tp_share slot allows for much more
experimentation with object sharing (right now, long before we get to considering how to stop sharing the GIL) by us *and* third parties. None of the alternatives seem to offer the same opportunity while still working out *after* we stop sharing the GIL.
And that's the real pay-off that comes from defining this in terms of the memoryview protocol: Py_buffer structs *aren't* Python objects, so it's only a regular C struct that gets passed across the interpreter boundary (the reference to the original objects gets carried along passively as part of the CIV - it never gets *used* in the receiving interpreter).
Yeah, the (PEP 3118) buffer protocol offers precedent in a number of ways that are applicable to channels here. I'm simply reticent to lock PEP 554 into such a specific solution as the buffer-specific CIV. I'm trying to accommodate anticipated future needs while keeping the PEP as simple and basic as possible. It's driving me nuts! :P Things were *much* simpler before I added Channels to the PEP. :)
bytes.tp_share(): obj = blank_bytes(len(self)) obj.ob_sval = self.ob_sval # hand-wavy memory sharing return obj
This is effectively reinventing memoryview, while trying to pretend it's an ordinary bytes object. Don't reinvent memoryview :)
bytes.tp_free(): # under no-shared-GIL: # most of this could be pulled into a macro for re-use orig = lookup_shared(self) if orig != NULL: current = release_LIL() interp = lookup_owner(orig) acquire_LIL(interp) decref(orig) release_LIL(interp) acquire_LIL(current) # clear shared/owner tables # clear/release self.ob_sval free(self)
I don't think we should be touching the behaviour of core builtins solely to enable message passing to subinterpreters without a shared GIL.
Keep in mind that I included the above as a possible solution using tp_share() that would work *after* we stop sharing the GIL. My point is that with tp_share() we have a solution that works now *and* will work later. I don't care how we use tp_share to do so. :) I long to be able to say in the PEP that you can pass bytes through the channel and get bytes on the other side.
That said, I'm not sure how this could be made to work without involving tp_free(). If that is really off the table (even in the simplest possible ways) then I don't think there is a way to actually share objects of builtin types between interpreters other than through views like CIV. We could still support tp_share() for the sake of third parties, which would facilitate that simplicity I was aiming for in sending data between interpreters, as well as leaving the door open for nearly all the same experimentation. However, I expect that most *uses* of channels will involve builtin types, particularly as we start off, so having to rely on view types for builtins would add not-insignificant awkwardness to using channels.
I'd still like to avoid that if possible, so let's not rush to completely close the door on small modifications to tp_free for builtins. :) Regardless, I still (after a night's rest and a day of not thinking about it) consider tp_share() to be the solution I'd been hoping we'd find, whether or not we can apply it to builtin types.
The simplest possible variant of CIVs that I can think of would be able to avoid that outcome by being a memoryview subclass, since they just need to hold the extra reference to the original interpreter, and include some logic to swtich interpreters at the appropriate time.
That said, I think there's definitely a useful design question to ask in this area, not about bytes (which can be readily represented by a memoryview variant in the receiving interpreter), but about *strings*: they have a more complex internal layout than bytes objects, but as long as the receiving interpreter can make sure that the original string continues to exist, then you could usefully implement a "strview" type to avoid having to go through an encode/decode cycle just to pass a string to another subinterpreter.
That would provide a reasonable compelling argument that CIVs *shouldn't* be implemented as memoryview subclasses, but instead defined as *containing* a managed view of an object owned by a different interpreter.
That way, even if the initial implementation only supported CIVs that contained a memoryview instance, we'd have the freedom to define other kinds of views later (such as strview), while being able to reuse the same CIV machinery.
Hmm, so a CIV implementation that accomplishes something similar to tp_share()?
For some reason I'm seeing similarities between CIV-vs.-tp_share and the import machinery before PEP 451. Before we added module specs, import hook authors had to do a bunch of the busy work that the import machinery does for you now by leveraging module specs. Back then we worked to provide a number of helpers to reduce that extra pain of writing an import hook. Now the helpers are irrelevant and the extra burden is gone.
My mind is drawn to the comparison between that and the question of CIV vs. tp_share(). CIV would be more like the post-451 import world, where I expect the CIV would take care of the data sharing operations. That said, the situation in PEP 554 is sufficiently different that I'm not convinced a generic CIV protocol would be better. I'm not sure how much CIV could do for you over helpers+tp_share.
Anyway, here are the leading approaches that I'm looking at now:
* adding a tp_share slot + you send() the object directly and recv() the object coming out of tp_share() (which will probably be the same type as the original) + this would eventually require small changes in tp_free for participating types + we would likely provide helpers (eventually), similar to the new buffer protocol, to make it easier to manage sharing data * simulating tp_share via an external global registry (or a registry on the Channel type) + it would still be hard to make work without hooking into tp_free() * CIVs hard-coded in Channel (or BufferViewChannel, etc.) for specific types (e.g. buffers) + you send() the object like normal, but recv() the view * a CIV protocol on Channel by which you can add support for more types + you send() the object like normal but recv() the view + could work through subclassing or a registry + a lot of conceptual similarity with tp_share+tp_free * a CIV-like proxy + you wrap the object, send() the proxy, and recv() a proxy + this is entirely compatible with tp_share()
Here are what I consider the key metrics relative to the utility of a solution (not in any significant order):
* how hard to understand as a Python programmer? * how much extra work (if any) for folks calling Channel.send()? * how much extra work (if any) for folks calling Channel.recv()? * how complex is the CPython implementation? * how hard to understand as a type author (wanting to add support for their type)? * how hard to add support for a new type? * what variety of types could be supported? * what breadth of experimentation opens up?
The most important thing to me is keeping things simple for Python programmers. After that is ease-of-use for type authors. However, I also want to put us in a good position in 3.7 to experiment extensively with subinterpreters, so that's a big consideration.
Consequently, for PEP 554 my goal is to find a solution for object sharing that keeps things simple in Python while laying a basic foundation we can build on at the C level, so we don't get locked in but still maximize our opportunities to experiment. :)