
On 14 September 2017 at 11:44, Eric Snow <ericsnowcurrently@gmail.com> wrote:
I've updated PEP 554 in response to feedback. (thanks all!) There are a few unresolved points (some of them added to the Open Questions section), but the current PEP has changed enough that I wanted to get it out there first.
Notably changed:
* the API relative to object passing has changed somewhat drastically (hopefully simpler and easier to understand), replacing "FIFO" with "channel" * added an examples section * added an open questions section * added a rejected ideas section * added more items to the deferred functionality section * the rationale section has moved down below the examples
Please let me know what you think. I'm especially interested in feedback about the channels. Thanks!
I like the new pipe-like channels API more than the previous named FIFO approach :)
send(obj):
Send the object to the receiving end of the channel. Wait until the object is received. If the channel does not support the object then TypeError is raised. Currently only bytes are supported. If the channel has been closed then EOFError is raised.
I still expect any form of object sharing to hinder your per-interpreter GIL efforts, so restricting the initial implementation to memoryview-only seems more future-proof to me.
Pre-populate an interpreter ---------------------------
::
interp = interpreters.create() interp.run("""if True: import some_lib import an_expensive_module some_lib.set_up() """) wait_for_request() interp.run("""if True: some_lib.handle_request() """)
I find the "if True:"'s sprinkled through the examples distracting, so I'd prefer either: 1. Using textwrap.dedent; or 2. Assigning the code to a module level attribute :: interp = interpreters.create() setup_code = """\ import some_lib import an_expensive_module some_lib.set_up() """ interp.run(setup_code) wait_for_request() handler_code = """\ some_lib.handle_request() """ interp.run(handler_code)
Handling an exception ---------------------
::
interp = interpreters.create() try: interp.run("""if True: raise KeyError """) except KeyError: print("got the error from the subinterpreter")
As with the message passing through channels, I think you'll really want to minimise any kind of implicit object sharing that may interfere with future efforts to make the GIL truly an *interpreter* lock, rather than the global process lock that it is currently. One possible way to approach that would be to make the low level run() API a more Go-style API rather than a Python-style one, and have it return a (result, err) 2-tuple. "err.raise()" would then translate the foreign interpreter's exception into a local interpreter exception, but the *traceback* for that exception would be entirely within the current interpreter.
About Subinterpreters =====================
Shared data -----------
Subinterpreters are inherently isolated (with caveats explained below), in contrast to threads. This enables `a different concurrency model <Concurrency_>`_ than is currently readily available in Python. `Communicating Sequential Processes`_ (CSP) is the prime example.
A key component of this approach to concurrency is message passing. So providing a message/object passing mechanism alongside ``Interpreter`` is a fundamental requirement. This proposal includes a basic mechanism upon which more complex machinery may be built. That basic mechanism draws inspiration from pipes, queues, and CSP's channels. [fifo]_
The key challenge here is that sharing objects between interpreters faces complexity due in part to CPython's current memory model. Furthermore, in this class of concurrency, the ideal is that objects only exist in one interpreter at a time. However, this is not practical for Python so we initially constrain supported objects to ``bytes``. There are a number of strategies we may pursue in the future to expand supported objects and object sharing strategies.
Note that the complexity of object sharing increases as subinterpreters become more isolated, e.g. after GIL removal. So the mechanism for message passing needs to be carefully considered. Keeping the API minimal and initially restricting the supported types helps us avoid further exposing any underlying complexity to Python users.
To make this work, the mutable shared state will be managed by the Python runtime, not by any of the interpreters. Initially we will support only one type of objects for shared state: the channels provided by ``create_channel()``. Channels, in turn, will carefully manage passing objects between interpreters.
Interpreters themselves will also need to be shared objects, as: - they all have access to "interpreters.list_all()" - when we do "interpreters.create_interpreter()", the calling interpreter gets a reference to itself via "interpreters.get_current()" (These shared objects are what I suspect you may end up needing a process global read/write lock to manage, by the way - I think it would be great if you can figure out a way to avoid that, it's just not entirely clear to me what that might look like. I do think you're on the right track by prohibiting the destruction of an interpreter that's currently running, and the destruction of channels that are currently still associated with an interpreter)
Interpreter Isolation ---------------------
This sections is a really nice addition :)
Existing Usage --------------
Subinterpreters are not a widely used feature. In fact, the only documented case of wide-spread usage is `mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_. On the one hand, this case provides confidence that existing subinterpreter support is relatively stable. On the other hand, there isn't much of a sample size from which to judge the utility of the feature.
Nathaniel pointed out that JEP embeds CPython subinterpreters inside the JVM similar to the way that mod_wsgi embeds them inside Apache httpd: https://github.com/ninia/jep/wiki/How-Jep-Works
Open Questions ==============
Leaking exceptions across interpreters --------------------------------------
As currently proposed, uncaught exceptions from ``run()`` propagate to the frame that called it. However, this means that exception objects are leaking across the inter-interpreter boundary. Likewise, the frames in the traceback potentially leak.
While that might not be a problem currently, it would be a problem once interpreters get better isolation relative to memory management (which is necessary to stop sharing the GIL between interpreters). So the semantics of how the exceptions propagate needs to be resolved.
As noted above, I think you *really* want to avoid leaking exceptions in the initial implementation. A non-exception-based error signaling mechanism would be one way to do that, similar to how the low-level subprocess APIs actually report the return code, which higher level APIs then turn into an exception. resp.raise_for_status() does something similar for HTTP responses in the requests API.
Initial support for buffers in channels ---------------------------------------
An alternative to support for bytes in channels in support for read-only buffers (the PEP 3119 kind). Then ``recv()`` would return a memoryview to expose the buffer in a zero-copy way. This is similar to what ``multiprocessing.Connection`` supports. [mp-conn]
Switching to such an approach would help resolve questions of how passing bytes through channels will work once we isolate memory management in interpreters.
Exactly :)
Reseting __main__ -----------------
As proposed, every call to ``Interpreter.run()`` will execute in the namespace of the interpreter's existing ``__main__`` module. This means that data persists there between ``run()`` calls. Sometimes this isn't desireable and you want to execute in a fresh ``__main__``. Also, you don't necessarily want to leak objects there that you aren't using any more.
Solutions include:
* a ``create()`` arg to indicate resetting ``__main__`` after each ``run`` call * an ``Interpreter.reset_main`` flag to support opting in or out after the fact * an ``Interpreter.reset_main()`` method to opt in when desired
This isn't a critical feature initially. It can wait until later if desirable.
I was going to note that you can already do this: interp.run("globals().clear()") However, that turns out to clear *too* much, since it also clobbers all the __dunder__ attributes that the interpreter needs in a code execution environment. Either way, if you added this, I think it would make more sense as an "importlib.util.reset_globals()" operation, rather than have it be something specific to subinterpreters. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia