[Python-ideas] PEP draft: context variables

Sat Oct 14 12:50:27 EDT 2017

On 14 October 2017 at 21:56, Paul Moore <p.f.moore at gmail.com> wrote:

TL;DR of below: PEP 550 currently gives you what you're after, so your
perspective counts as a preference for "please don't eagerly capture the
creation time context in generators or coroutines".

To give an example:
>
>     async def get_webpage(id):
>         url = f"https://{server}/{app}/items?id={id}"
>         # 1
>         encoding, content = await url_get(url)
>         #2
>         return content.decode(encoding)
>
> I would expect that, if I set a context variable at #1, and read it at #2,
> then:
>
>     1. code run as part of url_get would see the value set at 1
>     2. code run as part of url_get could set the value, and I'd see
> the new value at 2
>

This is consistent with what PEP 550 currently proposes, because you're
creating the coroutine and calling it in the same expression: "await
url_get(url)".

That's the same as what happens for synchronous function calls, which is
why we think it's also the right way for coroutines to behave.

The slightly more open-to-challenge case is this one:

    # Point 1 (pre-create)
    cr = url_get(url)
    # Point 2 (post-create, pre-call)
    encoding, content = await cr
    # Point 3 (post-call)

PEP 550 currently says that it doesn't matter whether you change the
context at point 1 or point 2, as "get_url" will see the context as it is
at the await call (i.e. when it actually gets executed), *not* as it is
when the coroutine is created.

The suggestion has been made that we should instead be capturing the active
context when "url_get(url)" is called, and implicitly switching back to
that at the point where await is called. It doesn't seem like a good idea
to me, as it breaks the "top to bottom" mental model of code execution
(since the "await cr" expression would briefly switch the context back to
the one that was in effect on the "cr = url_get(url)" line without even a
nested suite to indicate that we may be adjusting the order of code
execution).

It would also cause problems with allowing context changes to propagate out
of the "await cr" call, since capturing a context implies forking it, and
hence any changes would somehow need to be transplanted back to a
potentially divergent context history (if a context change *did* happen at
point 2 in the split example).

It doesn't matter what form the lines in the function take (loops,
> with statements, conditionals, ...) as long as they are run
> immediately (class and function definitions should be ignored -
> there's no lexical capture of context variables). That probably means
> "synchronous call stack" in your terms, but please don't assume that
> any implications of that term which aren't covered by the above
> example are obvious to me.
>

I think you got everything, as I really do just mean the stack of frames in
the current thread that will show up in a traceback. We normally just call
it "the call stack", but that's ambiguous whenever we're also talking about
coroutines (since each await chain creates its own distinct asynchronous
call stack).

> >     g = gen()
> >     with decimal.localcontext() as ctx:
> >         ctx.prec = 30
> >         for i in g:
> >           pass
>
> "for i in g" is getting values from the generator, at a time when the
> precision is 30, so those values should have precision 30.
>
> There's no confusion here to me. If that's not what decimal currently
> does, I'd happily report that as a bug.
>

This is the existing behaviour that PEP 550 is recommending we preserve as
the default generator semantics, even if decimal (or a comparable context
manager) switches to using context vars instead of thread locals.

As with coroutines, the question has been asked whether or not the "g =
gen()" line should be implicitly capturing the active execution context at
that point, and then switching backing it for each iteration of "for i in
g:".

> The reason I ask that is because there are three "interesting" times in
> the
> > life of a coroutine or generator:
> >
> > - definition time (when the def statement runs - this determines the
> lexical
> > closure)
> > - instance creation time (when the generator-iterator or coroutine is
> > instantiated)
> > - execution time (when the frame actually starts running - this
> determines
> > the runtime call stack)
>
> OK. They aren't *really* interesting to me (they are a low-level
> detail, but they should work to support intuitive semantics, not to
> define what my intuition should be) but I'd say that my expectation is
> that the *execution time* value of the context variable is what I'd
> expect to get and set.
>

That's the view PEP 550 currently takes as well.

> > For asynchronous operations, there's more of a question, since actual
> > execution is deferred until you call await or next() - the original
> > synchronous call to the factory function instantiates an object, it
> doesn't
> > actually *do* anything.
>
> This isn't particularly a question for me: g = gen() creates an
> object. next(g) - or more likely "for o in g" - runs it, and that's
> when the context matters. I struggle to understand why anyone would
> think otherwise.
>

If you capture the context eagerly, then there are fewer opportunities to
get materially different values from "data = list(iterable)" and "data =
iter(context_capturing_iterable)".

While that's a valid intent for folks to want to be able to express, I
personally think it would be more clearly requested via an expression like
"data = iter_in_context(iterable)" rather than having it be implicit in the
way generators work (especially since having eager context capture be
generator-only behaviour would create an odd discrepancy between generators
and other iterators like those in itertools).

> > When implicit isolation takes place, it's either to keep concurrently
> active
> > logical call stacks isolated from each other (the event loop case), and
> else
> > to keep context changes from implicitly leaking *up* a stack (the
> generator
> > case), not to keep context changes from propagating *down* a call stack.
>
> I don't understand this. If it matters, in terms of explaining corner
> cases of the semantics, then it needs to be explained in more
> intuitive terms. If it's an implementation detail of *how* the PEP
> ensures it acts intuitively, then I'm fine with not needing to care.
>

Cases where we expect context changes to be able to propagate into or out
of a frame:

- when you call something, it can see your context
- when something you called returns, you can see changes it made to your
context
- when a generator-based context manager is suspended

Call in the above deliberately covers both sychronous calls (with regular
call syntax) and asynchronous calls (with await or yield from).

Cases where we *don't* expect context changes to propagate out of a frame:

- when you spun up a separate logical thread of execution (either an actual
OS thread, or an event loop task)
- when a generator-based iterator is suspended

> > When we do want to prevent downward propagation for some reason, then
> that's
> > what "run_in_execution_context" is for: deliberate creation of a new
> > concurrently active call stack (similar to running something in another
> > thread to isolate the synchronous call stack).
>
> I read that as "run_in_execution_context is a specialised thing that
> you'll never need to use, because you don't understand its purpose -
> so just hope that in your code, everything will just work as you
> expect without it". The obvious omission here is an explanation of
> precisely who my interpretation *doesn't* apply for. Who are the
> audience for run_in_execution_context? If it's "people who write
> context managers that use context variables" then I'd say that's a
> problem, because I'd hope a lot of people would find use for this, and
> I wouldn't want them to have to understand the internals to this
> level. If it's something like "people who write async context managers
> using raw __aenter__ and __aexit__ functions, as opposed to the async
> version of @contextmanager", then that's probably fine.
>

Context managers would be fine (the defaults are deliberately set up to
make those "just work", either globally, or in the relevant decorators).

However, people who write event loops will need to care about it, as would
anyone writing an "iter_in_context" helper function.

Folks trying to strictly emulate generator semantics in their own iterators
would also need to worry about it, but "revert any context changes before
returning from __next__" is a simpler alternative to actually doing that.

> > Don't get me wrong, I'm not opposed to the idea of making it trivial to
> > define "micro tasks" (iterables that perform a context switch to a
> specified
> > execution context every time they retrieve a new value) that can provide
> > easy execution context isolation without an event loop to manage it, I
> just
> > think that would be more appropriate as a wrapper API that can be placed
> > around any iterable, rather than being baked in as an intrinsic property
> of
> > generators.
>
> I don't think it matters whether it's trivial to write "micro tasks"
> if non-experts don't know what they are ;-) I *do* think it matters if
> "micro tasks" are something non-experts might need to write, but not
> realise they are straying into deep waters. But I've no way of knowing
> how likely that is.
>

A micro-task is just a fancier name for  the "iter_in_context" idea above
(save the current context when the iterator is created, switch back to that
context every time you're asked for a new value).

> One final point, this is all pretty deeply intertwined with the
> comprehensibility of async as a whole. At the moment, as I said
> before, async is a specialised area that's largely only used in
> projects that centre around it. In the same way that Twisted is its
> own realm - people write network applications without Twisted, or they
> write them using Twisted. Nobody uses Twisted in the middle of some
> normal non-async application like pip to handle grabbing a webpage.
> I'm uncertain whether the intent is for the core async features to
> follow this model, or whether we'd expect in the longer term for
> "utility adoption" of async to happen (tactical use of async for
> something like web crawling or collecting subprocess output in a
> largely non-async app). If that *does* happen, then async needs to be
> much more widely understandable - maintenance programmers who have
> never used async will start encountering it in corners of their
> non-async applications, or find it used under the hood in libraries
> that they use. This discussion is a good example of the implications
> of that - async quirks leaking out into the "normal" world (decimal
> contexts) and as a result the async experts needing to be able to
> communicate their concerns and issues to non-experts.
>

Aye, this is why I'd like the semantics of context variables to be almost
indistinguishable from those of thread local variables for synchronous code
(aside from avoiding context changes leaking out of generator-iterators
when they yield from inside a with statement).

PEP 550 currently does a good job of ensuring that, but we'd break that
near equivalence if generators were to implicitly capture their creation
context.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171015/118d326b/attachment-0001.html>