Re: [Python-Dev] PEP 554 v3 (new interpreters module)

Sept. 14, 2017

      On 14 September 2017 at 11:44, Eric Snow <ericsnowcurrently@gmail.com> wrote:
...
I've updated PEP 554 in response to feedback.  (thanks all!)  There
are a few unresolved points (some of them added to the Open Questions
section), but the current PEP has changed enough that I wanted to get
it out there first.
Notably changed:
* the API relative to object passing has changed somewhat drastically
(hopefully simpler and easier to understand), replacing "FIFO" with
"channel"
* added an examples section
* added an open questions section
* added a rejected ideas section
* added more items to the deferred functionality section
* the rationale section has moved down below the examples
Please let me know what you think.  I'm especially interested in
feedback about the channels.  Thanks!
I like the new pipe-like channels API more than the previous named
FIFO approach :)
...
send(obj):
Send the object to the receiving end of the channel.  Wait until
       the object is received.  If the channel does not support the
       object then TypeError is raised.  Currently only bytes are
       supported.  If the channel has been closed then EOFError is
       raised.
I still expect any form of object sharing to hinder your
per-interpreter GIL efforts, so restricting the initial implementation
to memoryview-only seems more future-proof to me.
...
Pre-populate an interpreter
---------------------------
::
interp = interpreters.create()
   interp.run("""if True:
       import some_lib
       import an_expensive_module
       some_lib.set_up()
       """)
   wait_for_request()
   interp.run("""if True:
       some_lib.handle_request()
       """)
I find the "if True:"'s sprinkled through the examples distracting, so
I'd prefer either:

1. Using textwrap.dedent; or
2. Assigning the code to a module level attribute

::
   interp = interpreters.create()
   setup_code = """\
   import some_lib
   import an_expensive_module
   some_lib.set_up()
   """
   interp.run(setup_code)
   wait_for_request()

   handler_code = """\
   some_lib.handle_request()
   """
   interp.run(handler_code)
...
Handling an exception
---------------------
::
interp = interpreters.create()
   try:
       interp.run("""if True:
           raise KeyError
           """)
   except KeyError:
       print("got the error from the subinterpreter")
As with the message passing through channels, I think you'll really
want to minimise any kind of implicit object sharing that may
interfere with future efforts to make the GIL truly an *interpreter*
lock, rather than the global process lock that it is currently.

One possible way to approach that would be to make the low level run()
API a more Go-style API rather than a Python-style one, and have it
return a (result, err) 2-tuple. "err.raise()" would then translate the
foreign interpreter's exception into a local interpreter exception,
but the *traceback* for that exception would be entirely within the
current interpreter.
...
About Subinterpreters
=====================
Shared data
-----------
Subinterpreters are inherently isolated (with caveats explained below),
in contrast to threads.  This enables `a different concurrency model
<Concurrency_>`_ than is currently readily available in Python.
`Communicating Sequential Processes`_ (CSP) is the prime example.
A key component of this approach to concurrency is message passing.  So
providing a message/object passing mechanism alongside ``Interpreter``
is a fundamental requirement.  This proposal includes a basic mechanism
upon which more complex machinery may be built.  That basic mechanism
draws inspiration from pipes, queues, and CSP's channels. [fifo]_
The key challenge here is that sharing objects between interpreters
faces complexity due in part to CPython's current memory model.
Furthermore, in this class of concurrency, the ideal is that objects
only exist in one interpreter at a time.  However, this is not practical
for Python so we initially constrain supported objects to ``bytes``.
There are a number of strategies we may pursue in the future to expand
supported objects and object sharing strategies.
Note that the complexity of object sharing increases as subinterpreters
become more isolated, e.g. after GIL removal.  So the mechanism for
message passing needs to be carefully considered.  Keeping the API
minimal and initially restricting the supported types helps us avoid
further exposing any underlying complexity to Python users.
To make this work, the mutable shared state will be managed by the
Python runtime, not by any of the interpreters.  Initially we will
support only one type of objects for shared state: the channels provided
by ``create_channel()``.  Channels, in turn, will carefully manage
passing objects between interpreters.
Interpreters themselves will also need to be shared objects, as:

- they all have access to "interpreters.list_all()"
- when we do "interpreters.create_interpreter()", the calling
interpreter gets a reference to itself via
"interpreters.get_current()"

(These shared objects are what I suspect you may end up needing a
process global read/write lock to manage, by the way - I think it
would be great if you can figure out a way to avoid that, it's just
not entirely clear to me what that might look like. I do think you're
on the right track by prohibiting the destruction of an interpreter
that's currently running, and the destruction of channels that are
currently still associated with an interpreter)
...
Interpreter Isolation
---------------------
This sections is a really nice addition :)
...
Existing Usage
--------------
Subinterpreters are not a widely used feature.  In fact, the only
documented case of wide-spread usage is
`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_.  On the one
hand, this case provides confidence that existing subinterpreter support
is relatively stable.  On the other hand, there isn't much of a sample
size from which to judge the utility of the feature.
Nathaniel pointed out that JEP embeds CPython subinterpreters inside
the JVM similar to the way that mod_wsgi embeds them inside Apache
httpd: https://github.com/ninia/jep/wiki/How-Jep-Works
...
Open Questions
==============
Leaking exceptions across interpreters
--------------------------------------
As currently proposed, uncaught exceptions from ``run()`` propagate
to the frame that called it.  However, this means that exception
objects are leaking across the inter-interpreter boundary.  Likewise,
the frames in the traceback potentially leak.
While that might not be a problem currently, it would be a problem once
interpreters get better isolation relative to memory management (which
is necessary to stop sharing the GIL between interpreters).  So the
semantics of how the exceptions propagate needs to be resolved.
As noted above, I think you *really* want to avoid leaking exceptions
in the initial implementation. A non-exception-based error signaling
mechanism would be one way to do that, similar to how the low-level
subprocess APIs actually report the return code, which higher level
APIs then turn into an exception.

resp.raise_for_status() does something similar for HTTP responses in
the requests API.
...
Initial support for buffers in channels
---------------------------------------
An alternative to support for bytes in channels in support for
read-only buffers (the PEP 3119 kind).  Then ``recv()`` would return
a memoryview to expose the buffer in a zero-copy way.  This is similar
to what ``multiprocessing.Connection`` supports. [mp-conn]
Switching to such an approach would help resolve questions of how
passing bytes through channels will work once we isolate memory
management in interpreters.
Exactly :)
...
Reseting __main__
-----------------
As proposed, every call to ``Interpreter.run()`` will execute in the
namespace of the interpreter's existing ``__main__`` module.  This means
that data persists there between ``run()`` calls.  Sometimes this isn't
desireable and you want to execute in a fresh ``__main__``.  Also,
you don't necessarily want to leak objects there that you aren't using
any more.
Solutions include:
* a ``create()`` arg to indicate resetting ``__main__`` after each
  ``run`` call
* an ``Interpreter.reset_main`` flag to support opting in or out
  after the fact
* an ``Interpreter.reset_main()`` method to opt in when desired
This isn't a critical feature initially.  It can wait until later
if desirable.
I was going to note that you can already do this:

    interp.run("globals().clear()")

However, that turns out to clear *too* much, since it also clobbers
all the __dunder__ attributes that the interpreter needs in a code
execution environment.

Either way, if you added this, I think it would make more sense as an
"importlib.util.reset_globals()" operation, rather than have it be
something specific to subinterpreters.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia