Mailman 3 (PEP 555 subtopic) Propagation of context in async code - Python-ideas

13 Oct 2017

      This is a continuation of the PEP 555 discussion in

https://mail.python.org/pipermail/python-ideas/2017-September/046916.html

And this month in

https://mail.python.org/pipermail/python-ideas/2017-October/047279.html

If you are new to the discussion, the best point to start reading this
might be at my second full paragraph below ("The status quo...").

On Fri, Oct 13, 2017 at 10:25 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
...
On 13 October 2017 at 10:56, Guido van Rossum <guido@python.org> wrote:
...
I'm out of energy to debate every point (Steve said it well -- that
decimal/generator example is too contrived), but I found one nit in Nick's
email that I wish to correct.
On Wed, Oct 11, 2017 at 1:28 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
...
As a less-contrived example, consider context managers implemented as
generators.
We want those to run with the execution context that's active when
they're used in a with statement, not the one that's active when they're
created (the fact that generator-based context managers can only be used
once mitigates the risk of creation time context capture causing problems,
but the implications would still be weird enough to be worth avoiding).
Here I think we're in agreement about the desired semantics, but IMO all
this requires is some special casing for @contextlib.contextmanager. To me
this is the exception, not the rule -- in most *other* places I would want
the yield to switch away from the caller's context.
...
For native coroutines, we want them to run with the execution context
that's active when they're awaited or when they're prepared for submission
to an event loop, not the one that's active when they're created.
This caught my eye as wrong. Considering that asyncio's tasks (as well as
curio's and trio's) *are* native coroutines, we want complete isolation
between the context active when `await` is called and the context active
inside the `async def` function.
The rationale for this behaviour *does* arise from a refactoring argument:
async def original_async_function():
        with some_context():
            do_some_setup()
            raw_data = await some_operation()
            data = do_some_postprocessing(raw_data)
Refactored:
async def async_helper_function():
        do_some_setup()
        raw_data = await some_operation()
        return do_some_postprocessing(raw_data)
async def refactored_async_function():
        with some_context():
            data = await async_helper_function()
*This* type of refactoring argument I *do* subscribe to.
...
However, considering that coroutines are almost always instantiated at the
point where they're awaited, I do concede that creation time context
capture would likely also work out OK for the coroutine case, which would
leave contextlib.contextmanager as the only special case (and it would turn
off both creation-time context capture *and* context isolation).
The difference between context propagation through coroutine function
calls and awaits comes up when you need help from "the" event loop, which
means things like creating new tasks from coroutines. However, we cannot
even assume that the loop is the only one. So far, it makes no difference
where you call the coroutine function. It is only when you await it or
schedule it for execution in a loop when something can actually happen.

The status quo is that there's nothing that prevents you from calling a
coroutine function from within one event loop and then awaiting it in
another. So if we want an event loop to be able to pass information down
the call chain in such a way that the information is available *throughout
the whole task that it is driving*, then the contexts needs to a least
propagate through `await`s.

This was my starting point 2.5 years ago, when Yury was drafting this
status quo (PEP 492). It looked a lot of PEP 492 was inevitable, but that
there will be a problem, where each API that uses "blocking IO" somewhere
under the hood would need a duplicate version for asyncio (and one for each
third-party async framework!). I felt it was necessary to think about a
solution before PEP 492 is accepted, and this became a fairly short-lived
thread here on python-ideas:

https://mail.python.org/pipermail/python-ideas/2015-May/033267.html

This year, the discussion on Yury's PEP 550 somehow ended up with a very
similar need before I got involved, apparently for independent reasons.

A design for solving this need (and others) is also found in my first draft
of PEP 555, found at

https://mail.python.org/pipermail/python-ideas/2017-September/046916.html

Essentially, it's a way of *passing information down the call chain* when
it's inconvenient or impossible to pass the information as normal function
arguments. I now call the concept "context arguments".

More recently, I put some focus on the direct needs of normal users (as
opposed direct needs of async framework authors).

Those thoughts are most "easily" discussed in terms of generator functions,
which are very similar to coroutine functions: A generator function is
often thought of as a function that returns an iterable of lazily evaluated
values. In this type of usage, the relevant "function call" happens when
calling the generator function. The subsequent calls to next() (or a yield
from) are thought of as merely getting the items in the iterable, even if
they do actually run code in the generator's frame. The discussion on this
is found starting from this email:

https://mail.python.org/pipermail/python-ideas/2017-October/047279.html

However, also coroutines are evaluated lazily. The question is, when should
we consider the "subroutine call" to happen: when the coroutine function is
called, or when the resulting object is awaited. Often these two are indeed
on the same line of code, so it does not matter. But as I discuss above,
there are definitely cases where it matters. This has mostly to do with the
interactions of different tasks within one event loop, or code where
multiple event loops interact.

As mentioned above, there are cases where propagating the context through
next() and await is desirable. However, there are also cases where the
coroutine call is important. This comes up in the case of multiple
interacting tasks or multiple event loops.

To start with, probably a more example-friendly case, however, is running
an event loop and a coroutine from synchronous code:

import asyncio

async def do_something_context_aware():
    do_something_that_depends_on(current_context())

loop = asyncio.get_event_loop()

with some_context():
    coro = do_something_context_aware()

loop.run_until_complete(coro)

Now, if the coroutine function call `do_something_context_aware()` does not
save the current context on `coro`, then there is no way some_context() can
affect the code that will run inside the coroutine, even if that is what we
are explicitly trying to do here.

The easy solution is to delegate the context transfer to the scheduling
function (run_until_complete), and require that the context is passed to
that function:

with some_context():
    coro = do_something_context_aware()
    loop.run_until_complete(coro)

This gives the async framework (here asyncio) a chance to make sure the
context propagates as expected. In general, I'm in favor of giving async
frameworks some freedom in how this is implemented. However, to give the
framework even more freedom, the coroutine call,
do_something_context_aware(), could save the current context branch on
`coro`, which run_until_complete can attach to the Task that gets created.

The bigger question is, what should happen when a coroutine awaits on
another coroutine directly, without giving the framework a change to
interfere:

async def inner():
    do_context_aware_stuff()

async def outer():
    with first_context():
        coro = inner()

    with second_context():
        await coro

The big question is: In the above, which context should the coroutine be
run in?

"The" event loop does not have a chance to interfere, so we cannot delegate
the decision.

We need both versions: the one that propagates first_context() into the
coroutine, and the one that propagates second_context() into it. Or, using
my metaphor from the other thread, we need "both the forest and the trees".

A solution to this would be to have two types of context arguments:

1. (calling) context arguments

and

2. execution context arguments

Both of these would have their own stack of (argument, value) assignment
pairs, explained in the implementation part of the first PEP 555 draft.
While this is a complication, the performance overhead of these is so
small, that doubling the overhead should not be a performance concern. The
question is, do we want these two types of stacks, or do we want to work
around it somehow, for instance using context-local storage, implemented on
top of the first kind, to implement something like the second kind.
However, that again raises some issues of how to propagate the
context-local storage down the ambiguous call chain.

––Koos

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +

(PEP 555 subtopic) Propagation of context in async code

Koos Zevenhoven

Yury Selivanov

Koos Zevenhoven

Yury Selivanov

Koos Zevenhoven

Amit Green

Koos Zevenhoven

tags

participants (3)