[Python-Dev] PEP 554 v3 (new interpreters module)

Wed Sep 13 23:56:55 EDT 2017

On 14 September 2017 at 11:44, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> I've updated PEP 554 in response to feedback.  (thanks all!)  There
> are a few unresolved points (some of them added to the Open Questions
> section), but the current PEP has changed enough that I wanted to get
> it out there first.
>
> Notably changed:
>
> * the API relative to object passing has changed somewhat drastically
> (hopefully simpler and easier to understand), replacing "FIFO" with
> "channel"
> * added an examples section
> * added an open questions section
> * added a rejected ideas section
> * added more items to the deferred functionality section
> * the rationale section has moved down below the examples
>
> Please let me know what you think.  I'm especially interested in
> feedback about the channels.  Thanks!

I like the new pipe-like channels API more than the previous named
FIFO approach :)

>    send(obj):
>
>        Send the object to the receiving end of the channel.  Wait until
>        the object is received.  If the channel does not support the
>        object then TypeError is raised.  Currently only bytes are
>        supported.  If the channel has been closed then EOFError is
>        raised.

I still expect any form of object sharing to hinder your
per-interpreter GIL efforts, so restricting the initial implementation
to memoryview-only seems more future-proof to me.

> Pre-populate an interpreter
> ---------------------------
>
> ::
>
>    interp = interpreters.create()
>    interp.run("""if True:
>        import some_lib
>        import an_expensive_module
>        some_lib.set_up()
>        """)
>    wait_for_request()
>    interp.run("""if True:
>        some_lib.handle_request()
>        """)

I find the "if True:"'s sprinkled through the examples distracting, so
I'd prefer either:

1. Using textwrap.dedent; or
2. Assigning the code to a module level attribute

::
   interp = interpreters.create()
   setup_code = """\
   import some_lib
   import an_expensive_module
   some_lib.set_up()
   """
   interp.run(setup_code)
   wait_for_request()

   handler_code = """\
   some_lib.handle_request()
   """
   interp.run(handler_code)

> Handling an exception
> ---------------------
>
> ::
>
>    interp = interpreters.create()
>    try:
>        interp.run("""if True:
>            raise KeyError
>            """)
>    except KeyError:
>        print("got the error from the subinterpreter")

As with the message passing through channels, I think you'll really
want to minimise any kind of implicit object sharing that may
interfere with future efforts to make the GIL truly an *interpreter*
lock, rather than the global process lock that it is currently.

One possible way to approach that would be to make the low level run()
API a more Go-style API rather than a Python-style one, and have it
return a (result, err) 2-tuple. "err.raise()" would then translate the
foreign interpreter's exception into a local interpreter exception,
but the *traceback* for that exception would be entirely within the
current interpreter.

> About Subinterpreters
> =====================
>
> Shared data
> -----------
>
> Subinterpreters are inherently isolated (with caveats explained below),
> in contrast to threads.  This enables `a different concurrency model
> <Concurrency_>`_ than is currently readily available in Python.
> `Communicating Sequential Processes`_ (CSP) is the prime example.
>
> A key component of this approach to concurrency is message passing.  So
> providing a message/object passing mechanism alongside ``Interpreter``
> is a fundamental requirement.  This proposal includes a basic mechanism
> upon which more complex machinery may be built.  That basic mechanism
> draws inspiration from pipes, queues, and CSP's channels. [fifo]_
>
> The key challenge here is that sharing objects between interpreters
> faces complexity due in part to CPython's current memory model.
> Furthermore, in this class of concurrency, the ideal is that objects
> only exist in one interpreter at a time.  However, this is not practical
> for Python so we initially constrain supported objects to ``bytes``.
> There are a number of strategies we may pursue in the future to expand
> supported objects and object sharing strategies.
>
> Note that the complexity of object sharing increases as subinterpreters
> become more isolated, e.g. after GIL removal.  So the mechanism for
> message passing needs to be carefully considered.  Keeping the API
> minimal and initially restricting the supported types helps us avoid
> further exposing any underlying complexity to Python users.
>
> To make this work, the mutable shared state will be managed by the
> Python runtime, not by any of the interpreters.  Initially we will
> support only one type of objects for shared state: the channels provided
> by ``create_channel()``.  Channels, in turn, will carefully manage
> passing objects between interpreters.

Interpreters themselves will also need to be shared objects, as:

- they all have access to "interpreters.list_all()"
- when we do "interpreters.create_interpreter()", the calling
interpreter gets a reference to itself via
"interpreters.get_current()"

(These shared objects are what I suspect you may end up needing a
process global read/write lock to manage, by the way - I think it
would be great if you can figure out a way to avoid that, it's just
not entirely clear to me what that might look like. I do think you're
on the right track by prohibiting the destruction of an interpreter
that's currently running, and the destruction of channels that are
currently still associated with an interpreter)

> Interpreter Isolation
> ---------------------
>

This sections is a really nice addition :)

> Existing Usage
> --------------
>
> Subinterpreters are not a widely used feature.  In fact, the only
> documented case of wide-spread usage is
> `mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_.  On the one
> hand, this case provides confidence that existing subinterpreter support
> is relatively stable.  On the other hand, there isn't much of a sample
> size from which to judge the utility of the feature.

Nathaniel pointed out that JEP embeds CPython subinterpreters inside
the JVM similar to the way that mod_wsgi embeds them inside Apache
httpd: https://github.com/ninia/jep/wiki/How-Jep-Works

> Open Questions
> ==============
>
> Leaking exceptions across interpreters
> --------------------------------------
>
> As currently proposed, uncaught exceptions from ``run()`` propagate
> to the frame that called it.  However, this means that exception
> objects are leaking across the inter-interpreter boundary.  Likewise,
> the frames in the traceback potentially leak.
>
> While that might not be a problem currently, it would be a problem once
> interpreters get better isolation relative to memory management (which
> is necessary to stop sharing the GIL between interpreters).  So the
> semantics of how the exceptions propagate needs to be resolved.

As noted above, I think you *really* want to avoid leaking exceptions
in the initial implementation. A non-exception-based error signaling
mechanism would be one way to do that, similar to how the low-level
subprocess APIs actually report the return code, which higher level
APIs then turn into an exception.

resp.raise_for_status() does something similar for HTTP responses in
the requests API.

> Initial support for buffers in channels
> ---------------------------------------
>
> An alternative to support for bytes in channels in support for
> read-only buffers (the PEP 3119 kind).  Then ``recv()`` would return
> a memoryview to expose the buffer in a zero-copy way.  This is similar
> to what ``multiprocessing.Connection`` supports. [mp-conn]
>
> Switching to such an approach would help resolve questions of how
> passing bytes through channels will work once we isolate memory
> management in interpreters.

Exactly :)

> Reseting __main__
> -----------------
>
> As proposed, every call to ``Interpreter.run()`` will execute in the
> namespace of the interpreter's existing ``__main__`` module.  This means
> that data persists there between ``run()`` calls.  Sometimes this isn't
> desireable and you want to execute in a fresh ``__main__``.  Also,
> you don't necessarily want to leak objects there that you aren't using
> any more.
>
> Solutions include:
>
> * a ``create()`` arg to indicate resetting ``__main__`` after each
>   ``run`` call
> * an ``Interpreter.reset_main`` flag to support opting in or out
>   after the fact
> * an ``Interpreter.reset_main()`` method to opt in when desired
>
> This isn't a critical feature initially.  It can wait until later
> if desirable.

I was going to note that you can already do this:

    interp.run("globals().clear()")

However, that turns out to clear *too* much, since it also clobbers
all the __dunder__ attributes that the interpreter needs in a code
execution environment.

Either way, if you added this, I think it would make more sense as an
"importlib.util.reset_globals()" operation, rather than have it be
something specific to subinterpreters.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia