[Python-Dev] PEP 554 v3 (new interpreters module)
Nick Coghlan
ncoghlan at gmail.com
Wed Sep 13 23:56:55 EDT 2017
On 14 September 2017 at 11:44, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> I've updated PEP 554 in response to feedback. (thanks all!) There
> are a few unresolved points (some of them added to the Open Questions
> section), but the current PEP has changed enough that I wanted to get
> it out there first.
>
> Notably changed:
>
> * the API relative to object passing has changed somewhat drastically
> (hopefully simpler and easier to understand), replacing "FIFO" with
> "channel"
> * added an examples section
> * added an open questions section
> * added a rejected ideas section
> * added more items to the deferred functionality section
> * the rationale section has moved down below the examples
>
> Please let me know what you think. I'm especially interested in
> feedback about the channels. Thanks!
I like the new pipe-like channels API more than the previous named
FIFO approach :)
> send(obj):
>
> Send the object to the receiving end of the channel. Wait until
> the object is received. If the channel does not support the
> object then TypeError is raised. Currently only bytes are
> supported. If the channel has been closed then EOFError is
> raised.
I still expect any form of object sharing to hinder your
per-interpreter GIL efforts, so restricting the initial implementation
to memoryview-only seems more future-proof to me.
> Pre-populate an interpreter
> ---------------------------
>
> ::
>
> interp = interpreters.create()
> interp.run("""if True:
> import some_lib
> import an_expensive_module
> some_lib.set_up()
> """)
> wait_for_request()
> interp.run("""if True:
> some_lib.handle_request()
> """)
I find the "if True:"'s sprinkled through the examples distracting, so
I'd prefer either:
1. Using textwrap.dedent; or
2. Assigning the code to a module level attribute
::
interp = interpreters.create()
setup_code = """\
import some_lib
import an_expensive_module
some_lib.set_up()
"""
interp.run(setup_code)
wait_for_request()
handler_code = """\
some_lib.handle_request()
"""
interp.run(handler_code)
> Handling an exception
> ---------------------
>
> ::
>
> interp = interpreters.create()
> try:
> interp.run("""if True:
> raise KeyError
> """)
> except KeyError:
> print("got the error from the subinterpreter")
As with the message passing through channels, I think you'll really
want to minimise any kind of implicit object sharing that may
interfere with future efforts to make the GIL truly an *interpreter*
lock, rather than the global process lock that it is currently.
One possible way to approach that would be to make the low level run()
API a more Go-style API rather than a Python-style one, and have it
return a (result, err) 2-tuple. "err.raise()" would then translate the
foreign interpreter's exception into a local interpreter exception,
but the *traceback* for that exception would be entirely within the
current interpreter.
> About Subinterpreters
> =====================
>
> Shared data
> -----------
>
> Subinterpreters are inherently isolated (with caveats explained below),
> in contrast to threads. This enables `a different concurrency model
> <Concurrency_>`_ than is currently readily available in Python.
> `Communicating Sequential Processes`_ (CSP) is the prime example.
>
> A key component of this approach to concurrency is message passing. So
> providing a message/object passing mechanism alongside ``Interpreter``
> is a fundamental requirement. This proposal includes a basic mechanism
> upon which more complex machinery may be built. That basic mechanism
> draws inspiration from pipes, queues, and CSP's channels. [fifo]_
>
> The key challenge here is that sharing objects between interpreters
> faces complexity due in part to CPython's current memory model.
> Furthermore, in this class of concurrency, the ideal is that objects
> only exist in one interpreter at a time. However, this is not practical
> for Python so we initially constrain supported objects to ``bytes``.
> There are a number of strategies we may pursue in the future to expand
> supported objects and object sharing strategies.
>
> Note that the complexity of object sharing increases as subinterpreters
> become more isolated, e.g. after GIL removal. So the mechanism for
> message passing needs to be carefully considered. Keeping the API
> minimal and initially restricting the supported types helps us avoid
> further exposing any underlying complexity to Python users.
>
> To make this work, the mutable shared state will be managed by the
> Python runtime, not by any of the interpreters. Initially we will
> support only one type of objects for shared state: the channels provided
> by ``create_channel()``. Channels, in turn, will carefully manage
> passing objects between interpreters.
Interpreters themselves will also need to be shared objects, as:
- they all have access to "interpreters.list_all()"
- when we do "interpreters.create_interpreter()", the calling
interpreter gets a reference to itself via
"interpreters.get_current()"
(These shared objects are what I suspect you may end up needing a
process global read/write lock to manage, by the way - I think it
would be great if you can figure out a way to avoid that, it's just
not entirely clear to me what that might look like. I do think you're
on the right track by prohibiting the destruction of an interpreter
that's currently running, and the destruction of channels that are
currently still associated with an interpreter)
> Interpreter Isolation
> ---------------------
>
This sections is a really nice addition :)
> Existing Usage
> --------------
>
> Subinterpreters are not a widely used feature. In fact, the only
> documented case of wide-spread usage is
> `mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_. On the one
> hand, this case provides confidence that existing subinterpreter support
> is relatively stable. On the other hand, there isn't much of a sample
> size from which to judge the utility of the feature.
Nathaniel pointed out that JEP embeds CPython subinterpreters inside
the JVM similar to the way that mod_wsgi embeds them inside Apache
httpd: https://github.com/ninia/jep/wiki/How-Jep-Works
> Open Questions
> ==============
>
> Leaking exceptions across interpreters
> --------------------------------------
>
> As currently proposed, uncaught exceptions from ``run()`` propagate
> to the frame that called it. However, this means that exception
> objects are leaking across the inter-interpreter boundary. Likewise,
> the frames in the traceback potentially leak.
>
> While that might not be a problem currently, it would be a problem once
> interpreters get better isolation relative to memory management (which
> is necessary to stop sharing the GIL between interpreters). So the
> semantics of how the exceptions propagate needs to be resolved.
As noted above, I think you *really* want to avoid leaking exceptions
in the initial implementation. A non-exception-based error signaling
mechanism would be one way to do that, similar to how the low-level
subprocess APIs actually report the return code, which higher level
APIs then turn into an exception.
resp.raise_for_status() does something similar for HTTP responses in
the requests API.
> Initial support for buffers in channels
> ---------------------------------------
>
> An alternative to support for bytes in channels in support for
> read-only buffers (the PEP 3119 kind). Then ``recv()`` would return
> a memoryview to expose the buffer in a zero-copy way. This is similar
> to what ``multiprocessing.Connection`` supports. [mp-conn]
>
> Switching to such an approach would help resolve questions of how
> passing bytes through channels will work once we isolate memory
> management in interpreters.
Exactly :)
> Reseting __main__
> -----------------
>
> As proposed, every call to ``Interpreter.run()`` will execute in the
> namespace of the interpreter's existing ``__main__`` module. This means
> that data persists there between ``run()`` calls. Sometimes this isn't
> desireable and you want to execute in a fresh ``__main__``. Also,
> you don't necessarily want to leak objects there that you aren't using
> any more.
>
> Solutions include:
>
> * a ``create()`` arg to indicate resetting ``__main__`` after each
> ``run`` call
> * an ``Interpreter.reset_main`` flag to support opting in or out
> after the fact
> * an ``Interpreter.reset_main()`` method to opt in when desired
>
> This isn't a critical feature initially. It can wait until later
> if desirable.
I was going to note that you can already do this:
interp.run("globals().clear()")
However, that turns out to clear *too* much, since it also clobbers
all the __dunder__ attributes that the interpreter needs in a code
execution environment.
Either way, if you added this, I think it would make more sense as an
"importlib.util.reset_globals()" operation, rather than have it be
something specific to subinterpreters.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-Dev
mailing list