[Python-ideas] PEP draft: context variables

Sat Oct 14 07:56:44 EDT 2017

On 14 October 2017 at 08:09, Nick Coghlan <ncoghlan at gmail.com> wrote:
> To try and bring this back to synchronous examples that folks may find more
> intuitive, I figure it's worth framing the question this way: do we want
> people to reason about context variables like the active context is
> implicitly linked to the synchronous call stack, or do we want to encourage
> them to learn to reason about them more like they're a new kind of closure?

I'm really struggling to keep up here. I need to go and fully read the
PEP as Yury suggested, and focus on what's in there. But I'll try to
answer this comment. I will ask one question, though, based on Yury's
point "the PEP is where you should look for the actual semantics" -
can you state where in the PEP is affected by the answer to this
question? I want to make sure that when I read the PEP, I don't miss
the place that this whole discussion thread is about...

I don't think of contexts in terms of *either* the "synchronous call
stack" (which, by the way, is much too technical a term to make sense
to the "non-expert" people around here like me - I know what the term
means, but only in a way that's far to low-level to give me an
intuitive sense of what contexts are) or closures.

At the risk of using another analogy that's unfamiliar to a lot of
people, I think of them in terms of Lisp's dynamic variables. Code
that needs a context variable, gets the value that's current *at that
time*. I don't want to have to think lower level than that - if I have
to, then in my view there's a problem with a *different* abstraction
(specifically async ;-))

To give an example:

    async def get_webpage(id):
        url = f"https://{server}/{app}/items?id={id}"
        # 1
        encoding, content = await url_get(url)
        #2
        return content.decode(encoding)

I would expect that, if I set a context variable at #1, and read it at #2, then:

    1. code run as part of url_get would see the value set at 1
    2. code run as part of url_get could set the value, and I'd see
the new value at 2

It doesn't matter what form the lines in the function take (loops,
with statements, conditionals, ...) as long as they are run
immediately (class and function definitions should be ignored -
there's no lexical capture of context variables). That probably means
"synchronous call stack" in your terms, but please don't assume that
any implications of that term which aren't covered by the above
example are obvious to me.

To use the decimal context example:

>     with decimal.localcontext() as ctx:
>         ctx.prec = 30
>         for i in gen():
>            pass

There's only one setting of a context here, so it's obvious - values
returned from gen have precision 30.

>     g = gen()
>     with decimal.localcontext() as ctx:
>         ctx.prec = 30
>         for i in g:
>           pass

"for i in g" is getting values from the generator, at a time when the
precision is 30, so those values should have precision 30.

There's no confusion here to me. If that's not what decimal currently
does, I'd happily report that as a bug.

The refactoring case is similarly obvious to me:

>    async def original_async_function():
>         with some_context():
>             do_some_setup()
>             raw_data = await some_operation()
>             data = do_some_postprocessing(raw_data)
>
> Refactored:
>
>    async def async_helper_function():
>         do_some_setup()
>         raw_data = await some_operation()
>         return do_some_postprocessing(raw_data)
>
>    async def refactored_async_function():
>         with some_context():
>             data = await async_helper_function()
>

All we've done here is take some code out of the with block and write
it as a helper. There should be no change of semantics when doing so.
That's a fundamental principle to me, and honestly I don't see it as
credible for anyone to say otherwise. (Anyone who suggests that is
basically saying "if you use async, common sense goes out of the
window" as far as I'm concerned).

> The reason I ask that is because there are three "interesting" times in the
> life of a coroutine or generator:
>
> - definition time (when the def statement runs - this determines the lexical
> closure)
> - instance creation time (when the generator-iterator or coroutine is
> instantiated)
> - execution time (when the frame actually starts running - this determines
> the runtime call stack)

OK. They aren't *really* interesting to me (they are a low-level
detail, but they should work to support intuitive semantics, not to
define what my intuition should be) but I'd say that my expectation is
that the *execution time* value of the context variable is what I'd
expect to get and set.

> For synchronous functions, instance creation time and execution time are
> intrinsically linked, since the execution frame is allocated and executed
> directly as part of calling the function.
>
> For asynchronous operations, there's more of a question, since actual
> execution is deferred until you call await or next() - the original
> synchronous call to the factory function instantiates an object, it doesn't
> actually *do* anything.

This isn't particularly a question for me: g = gen() creates an
object. next(g) - or more likely "for o in g" - runs it, and that's
when the context matters. I struggle to understand why anyone would
think otherwise.

> The current position of PEP 550 (which I agree with) is that context
> variables should default to being closely associated with the active call
> stack (regardless of whether those calls are regular synchronous ones, or
> asynchronous ones with await), as this keeps the synchronous and
> asynchronous semantics of context variables as close to each other as we can
> feasibly make them.

At the high level we're talking here, I agree with this.

> When implicit isolation takes place, it's either to keep concurrently active
> logical call stacks isolated from each other (the event loop case), and else
> to keep context changes from implicitly leaking *up* a stack (the generator
> case), not to keep context changes from propagating *down* a call stack.

I don't understand this. If it matters, in terms of explaining corner
cases of the semantics, then it needs to be explained in more
intuitive terms. If it's an implementation detail of *how* the PEP
ensures it acts intuitively, then I'm fine with not needing to care.

> When we do want to prevent downward propagation for some reason, then that's
> what "run_in_execution_context" is for: deliberate creation of a new
> concurrently active call stack (similar to running something in another
> thread to isolate the synchronous call stack).

I read that as "run_in_execution_context is a specialised thing that
you'll never need to use, because you don't understand its purpose -
so just hope that in your code, everything will just work as you
expect without it". The obvious omission here is an explanation of
precisely who my interpretation *doesn't* apply for. Who are the
audience for run_in_execution_context? If it's "people who write
context managers that use context variables" then I'd say that's a
problem, because I'd hope a lot of people would find use for this, and
I wouldn't want them to have to understand the internals to this
level. If it's something like "people who write async context managers
using raw __aenter__ and __aexit__ functions, as opposed to the async
version of @contextmanager", then that's probably fine.

> Don't get me wrong, I'm not opposed to the idea of making it trivial to
> define "micro tasks" (iterables that perform a context switch to a specified
> execution context every time they retrieve a new value) that can provide
> easy execution context isolation without an event loop to manage it, I just
> think that would be more appropriate as a wrapper API that can be placed
> around any iterable, rather than being baked in as an intrinsic property of
> generators.

I don't think it matters whether it's trivial to write "micro tasks"
if non-experts don't know what they are ;-) I *do* think it matters if
"micro tasks" are something non-experts might need to write, but not
realise they are straying into deep waters. But I've no way of knowing
how likely that is.

One final point, this is all pretty deeply intertwined with the
comprehensibility of async as a whole. At the moment, as I said
before, async is a specialised area that's largely only used in
projects that centre around it. In the same way that Twisted is its
own realm - people write network applications without Twisted, or they
write them using Twisted. Nobody uses Twisted in the middle of some
normal non-async application like pip to handle grabbing a webpage.
I'm uncertain whether the intent is for the core async features to
follow this model, or whether we'd expect in the longer term for
"utility adoption" of async to happen (tactical use of async for
something like web crawling or collecting subprocess output in a
largely non-async app). If that *does* happen, then async needs to be
much more widely understandable - maintenance programmers who have
never used async will start encountering it in corners of their
non-async applications, or find it used under the hood in libraries
that they use. This discussion is a good example of the implications
of that - async quirks leaking out into the "normal" world (decimal
contexts) and as a result the async experts needing to be able to
communicate their concerns and issues to non-experts.

Hopefully some of this helps,
Paul