
On 14 October 2017 at 08:09, Nick Coghlan <ncoghlan@gmail.com> wrote:
To try and bring this back to synchronous examples that folks may find more intuitive, I figure it's worth framing the question this way: do we want people to reason about context variables like the active context is implicitly linked to the synchronous call stack, or do we want to encourage them to learn to reason about them more like they're a new kind of closure?
I'm really struggling to keep up here. I need to go and fully read the PEP as Yury suggested, and focus on what's in there. But I'll try to answer this comment. I will ask one question, though, based on Yury's point "the PEP is where you should look for the actual semantics" - can you state where in the PEP is affected by the answer to this question? I want to make sure that when I read the PEP, I don't miss the place that this whole discussion thread is about... I don't think of contexts in terms of *either* the "synchronous call stack" (which, by the way, is much too technical a term to make sense to the "non-expert" people around here like me - I know what the term means, but only in a way that's far to low-level to give me an intuitive sense of what contexts are) or closures. At the risk of using another analogy that's unfamiliar to a lot of people, I think of them in terms of Lisp's dynamic variables. Code that needs a context variable, gets the value that's current *at that time*. I don't want to have to think lower level than that - if I have to, then in my view there's a problem with a *different* abstraction (specifically async ;-)) To give an example: async def get_webpage(id): url = f"https://{server}/{app}/items?id={id}" # 1 encoding, content = await url_get(url) #2 return content.decode(encoding) I would expect that, if I set a context variable at #1, and read it at #2, then: 1. code run as part of url_get would see the value set at 1 2. code run as part of url_get could set the value, and I'd see the new value at 2 It doesn't matter what form the lines in the function take (loops, with statements, conditionals, ...) as long as they are run immediately (class and function definitions should be ignored - there's no lexical capture of context variables). That probably means "synchronous call stack" in your terms, but please don't assume that any implications of that term which aren't covered by the above example are obvious to me. To use the decimal context example:
with decimal.localcontext() as ctx: ctx.prec = 30 for i in gen(): pass
There's only one setting of a context here, so it's obvious - values returned from gen have precision 30.
g = gen() with decimal.localcontext() as ctx: ctx.prec = 30 for i in g: pass
"for i in g" is getting values from the generator, at a time when the precision is 30, so those values should have precision 30. There's no confusion here to me. If that's not what decimal currently does, I'd happily report that as a bug. The refactoring case is similarly obvious to me:
async def original_async_function(): with some_context(): do_some_setup() raw_data = await some_operation() data = do_some_postprocessing(raw_data)
Refactored:
async def async_helper_function(): do_some_setup() raw_data = await some_operation() return do_some_postprocessing(raw_data)
async def refactored_async_function(): with some_context(): data = await async_helper_function()
All we've done here is take some code out of the with block and write it as a helper. There should be no change of semantics when doing so. That's a fundamental principle to me, and honestly I don't see it as credible for anyone to say otherwise. (Anyone who suggests that is basically saying "if you use async, common sense goes out of the window" as far as I'm concerned).
The reason I ask that is because there are three "interesting" times in the life of a coroutine or generator:
- definition time (when the def statement runs - this determines the lexical closure) - instance creation time (when the generator-iterator or coroutine is instantiated) - execution time (when the frame actually starts running - this determines the runtime call stack)
OK. They aren't *really* interesting to me (they are a low-level detail, but they should work to support intuitive semantics, not to define what my intuition should be) but I'd say that my expectation is that the *execution time* value of the context variable is what I'd expect to get and set.
For synchronous functions, instance creation time and execution time are intrinsically linked, since the execution frame is allocated and executed directly as part of calling the function.
For asynchronous operations, there's more of a question, since actual execution is deferred until you call await or next() - the original synchronous call to the factory function instantiates an object, it doesn't actually *do* anything.
This isn't particularly a question for me: g = gen() creates an object. next(g) - or more likely "for o in g" - runs it, and that's when the context matters. I struggle to understand why anyone would think otherwise.
The current position of PEP 550 (which I agree with) is that context variables should default to being closely associated with the active call stack (regardless of whether those calls are regular synchronous ones, or asynchronous ones with await), as this keeps the synchronous and asynchronous semantics of context variables as close to each other as we can feasibly make them.
At the high level we're talking here, I agree with this.
When implicit isolation takes place, it's either to keep concurrently active logical call stacks isolated from each other (the event loop case), and else to keep context changes from implicitly leaking *up* a stack (the generator case), not to keep context changes from propagating *down* a call stack.
I don't understand this. If it matters, in terms of explaining corner cases of the semantics, then it needs to be explained in more intuitive terms. If it's an implementation detail of *how* the PEP ensures it acts intuitively, then I'm fine with not needing to care.
When we do want to prevent downward propagation for some reason, then that's what "run_in_execution_context" is for: deliberate creation of a new concurrently active call stack (similar to running something in another thread to isolate the synchronous call stack).
I read that as "run_in_execution_context is a specialised thing that you'll never need to use, because you don't understand its purpose - so just hope that in your code, everything will just work as you expect without it". The obvious omission here is an explanation of precisely who my interpretation *doesn't* apply for. Who are the audience for run_in_execution_context? If it's "people who write context managers that use context variables" then I'd say that's a problem, because I'd hope a lot of people would find use for this, and I wouldn't want them to have to understand the internals to this level. If it's something like "people who write async context managers using raw __aenter__ and __aexit__ functions, as opposed to the async version of @contextmanager", then that's probably fine.
Don't get me wrong, I'm not opposed to the idea of making it trivial to define "micro tasks" (iterables that perform a context switch to a specified execution context every time they retrieve a new value) that can provide easy execution context isolation without an event loop to manage it, I just think that would be more appropriate as a wrapper API that can be placed around any iterable, rather than being baked in as an intrinsic property of generators.
I don't think it matters whether it's trivial to write "micro tasks" if non-experts don't know what they are ;-) I *do* think it matters if "micro tasks" are something non-experts might need to write, but not realise they are straying into deep waters. But I've no way of knowing how likely that is. One final point, this is all pretty deeply intertwined with the comprehensibility of async as a whole. At the moment, as I said before, async is a specialised area that's largely only used in projects that centre around it. In the same way that Twisted is its own realm - people write network applications without Twisted, or they write them using Twisted. Nobody uses Twisted in the middle of some normal non-async application like pip to handle grabbing a webpage. I'm uncertain whether the intent is for the core async features to follow this model, or whether we'd expect in the longer term for "utility adoption" of async to happen (tactical use of async for something like web crawling or collecting subprocess output in a largely non-async app). If that *does* happen, then async needs to be much more widely understandable - maintenance programmers who have never used async will start encountering it in corners of their non-async applications, or find it used under the hood in libraries that they use. This discussion is a good example of the implications of that - async quirks leaking out into the "normal" world (decimal contexts) and as a result the async experts needing to be able to communicate their concerns and issues to non-experts. Hopefully some of this helps, Paul