On 14 September 2017 at 15:27, Nathaniel Smith <njs@pobox.com> wrote:
On Sep 13, 2017 9:01 PM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
On 14 September 2017 at 11:44, Eric Snow <ericsnowcurrently@gmail.com> wrote:
send(obj):
Send the object to the receiving end of the channel. Wait until the object is received. If the channel does not support the object then TypeError is raised. Currently only bytes are supported. If the channel has been closed then EOFError is raised.
I still expect any form of object sharing to hinder your per-interpreter GIL efforts, so restricting the initial implementation to memoryview-only seems more future-proof to me.
I don't get it. With bytes, you can either share objects or copy them and the user can't tell the difference, so you can change your mind later if you want. But memoryviews require some kind of cross-interpreter strong reference to keep the underlying buffer object alive. So if you want to minimize object sharing, surely bytes are more future-proof.
Not really, because the only way to ensure object separation (i.e no refcounted objects accessible from multiple interpreters at once) with a bytes-based API would be to either: 1. Always copy (eliminating most of the low overhead communications benefits that subinterpreters may offer over multiple processes) 2. Make the bytes implementation more complicated by allowing multiple bytes objects to share the same underlying storage while presenting as distinct objects in different interpreters 3. Make the output on the receiving side not actually a bytes object, but instead a view onto memory owned by another object in a different interpreter (a "memory view", one might say) And yes, using memory views for this does mean defining either a subclass or a mediating object that not only keeps the originating object alive until the receiving memoryview is closed, but also retains a reference to the originating interpreter so that it can switch to it when it needs to manipulate the source object's refcount or call one of the buffer methods. Yury and I are fine with that, since it means that either the sender *or* the receiver can decide to copy the data (e.g. by calling bytes(obj) before sending, or bytes(view) after receiving), and in the meantime, the object holding the cross-interpreter view knows that it needs to switch interpreters (and hence acquire the sending interpreter's GIL) before doing anything with the source object. The reason we're OK with this is that it means that only reading a new message from a channel (i.e creating a cross-interpreter view) or discarding a previously read message (i.e. closing a cross-interpreter view) will be synchronisation points where the receiving interpreter necessarily needs to acquire the sending interpreter's GIL. By contrast, if we allow an actual bytes object to be shared, then either every INCREF or DECREF on that bytes object becomes a synchronisation point, or else we end up needing some kind of secondary per-interpreter refcount where the interpreter doesn't drop its shared reference to the original object in its source interpreter until the internal refcount in the borrowing interpreter drops to zero.
Handling an exception --------------------- It would also be reasonable to simply not return any value/exception from run() at all, or maybe just a bool for whether there was an unhandled exception. Any high level API is going to be injecting code on both sides of the interpreter boundary anyway, so it can do whatever exception and traceback translation it wants to.
So any more detailed response would *have* to come back as a channel message? That sounds like a reasonable option to me, too, especially since module level code doesn't have a return value as such - you can really only say "it raised an exception (and this was the exception it raised)" or "it reached the end of the code without raising an exception". Given that, I think subprocess.run() (with check=False) is the right API precedent here: https://docs.python.org/3/library/subprocess.html#subprocess.run That always returns subprocess.CompletedProcess, and then you can call "cp.check_returncode()" to get it to raise subprocess.CalledProcessError for non-zero return codes. For interpreter.run(), we could keep the initial RunResult *really* simple and only report back: * source: the source code passed to run() * shared: the keyword args passed to run() (name chosen to match functools.partial) * completed: completed execution without raising an exception? (True if yes, False otherwise) Whether or not to report more details for a raised exception, and provide some mechanism to reraise it in the calling interpreter could then be deferred until later. The subprocess.run() comparison does make me wonder whether this might be a more future-proof signature for Interpreter.run() though: def run(source_str, /, *, channels=None): ... That way channels can be a namespace *specifically* for passing in channels, and can be reported as such on RunResult. If we decide to allow arbitrary shared objects in the future, or add flag options like "reraise=True" to reraise exceptions from the subinterpreter in the current interpreter, we'd have that ability, rather than having the entire potential keyword namespace taken up for passing shared objects. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia