Hi,
This is the 4th iteration of the PEP that Elvis and I have
rewritten from scratch.
The specification section has been separated from the implementation
section, which makes them easier to follow.
During the rewrite, we realized that generators and coroutines should
work with the EC in exactly the same way (coroutines used to be
created with no LC in prior versions of the PEP).
We also renamed Context Keys to Context Variables which seems
to be a more appropriate name.
Hopefully this update will resolve the remaining questions
about the specification and the proposed implementation, and
will allow us to focus on refining the API.
Yury
PEP: 550
Title: Execution Context
Version: $Revision$
Last-Modified: $Date$
Author: Yury Selivanov
All in all, I like it. Nice job. On 08/25/2017 03:32 PM, Yury Selivanov wrote:
A *context variable* is an object representing a value in the execution context. A new context variable is created by calling the ``new_context_var()`` function. A context variable object has two methods:
* ``lookup()``: returns the value of the variable in the current execution context;
* ``set()``: sets the value of the variable in the current execution context.
Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is well understood.
Conceptually, an *execution context* (EC) is a stack of logical contexts. There is one EC per Python thread.
A *logical context* (LC) is a mapping of context variables to their values in that particular LC.
Still don't like the name of "logical context", but we can bike-shed on that later. -- ~Ethan~
This is now looking really good and I can understands it.
One question though. Sometimes creation of a context variable is done with
a name argument, other times not. E.g.
var1 = new_context_var('var1')
var = new_context_var()
The signature is given as:
sys.new_context_var(name: str)
But it seems like it should be:
sys.new_context_var(name: Optional[str]=None)
On Aug 25, 2017 3:35 PM, "Yury Selivanov"
On Fri, Aug 25, 2017 at 3:32 PM, Yury Selivanov
Coroutines and Asynchronous Tasks ---------------------------------
In coroutines, like in generators, context variable changes are local and are not visible to the caller::
import asyncio
var = new_context_var()
async def sub(): assert var.lookup() == 'main' var.set('sub') assert var.lookup() == 'sub'
async def main(): var.set('main') await sub() assert var.lookup() == 'main'
loop = asyncio.get_event_loop() loop.run_until_complete(main())
I think this change is a bad idea. I think that generally, an async call like 'await async_sub()' should have the equivalent semantics to a synchronous call like 'sync_sub()', except for the part where the former is able to contain yields. Giving every coroutine an LC breaks that equivalence. It also makes it so in async code, you can't necessarily refactor by moving code in and out of subroutines. Like, if we inline 'sub' into 'main', that shouldn't change the semantics, but... async def main(): var.set('main') # inlined copy of sub() assert var.lookup() == 'main' var.set('sub') assert var.lookup() == 'sub' # end of inlined copy assert var.lookup() == 'main' # fails It also adds non-trivial overhead, because now lookup() is O(depth of async callstack), instead of O(depth of (async) generator nesting), which is generally much smaller. I think I see the motivation: you want to make await sub() and await ensure_future(sub()) have the same semantics, right? And the latter has to create a Task and split it off into a new execution context, so you want the former to do so as well? But to me this is like saying that we want sync_sub() and thread_pool_executor.submit(sync_sub).result() to have the same semantics: they mostly do, but sync_sub() access thread-locals then they won't. Oh well. That's perhaps a but unfortunate, but it doesn't mean we should give every synchronous frame its own thread-locals. (And fwiw I'm still not convinced we should give up on 'yield from' as a mechanism for refactoring generators.)
To establish the full semantics of execution context in couroutines, we must also consider *tasks*. A task is the abstraction used by *asyncio*, and other similar libraries, to manage the concurrent execution of coroutines. In the example above, a task is created implicitly by the ``run_until_complete()`` function. ``asyncio.wait_for()`` is another example of implicit task creation::
async def sub(): await asyncio.sleep(1) assert var.lookup() == 'main'
async def main(): var.set('main')
# waiting for sub() directly await sub()
# waiting for sub() with a timeout await asyncio.wait_for(sub(), timeout=2)
var.set('main changed')
Intuitively, we expect the assertion in ``sub()`` to hold true in both invocations, even though the ``wait_for()`` implementation actually spawns a task, which runs ``sub()`` concurrently with ``main()``.
I found this example confusing -- you talk about sub() and main() running concurrently, but ``wait_for`` blocks main() until sub() has finished running, right? Is this just supposed to show that there should be some sort of inheritance across tasks, and then the next example is to show that it has to be a copy rather than sharing the actual object? (This is just a issue of phrasing/readability.)
The ``sys.run_with_logical_context()`` function performs the following steps:
1. Push *lc* onto the current execution context stack. 2. Run ``func(*args, **kwargs)``. 3. Pop *lc* from the execution context stack. 4. Return or raise the ``func()`` result.
It occurs to me that both this and the way generator/coroutines expose their logic context means that logical context objects are semantically mutable. This could create weird effects if someone attaches the same LC to two different generators, or tries to use it simultaneously in two different threads, etc. We should have a little interlock like generator's ag_running, where an LC keeps track of whether it's currently in use and if you try to push the same LC onto two ECs simultaneously then it errors out.
For efficient access in performance-sensitive code paths, such as in ``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``, making it an O(1) operation when the cache is hit. The cache key is composed from the following:
* The new ``uint64_t PyThreadState->unique_id``, which is a globally unique thread state identifier. It is computed from the new ``uint64_t PyInterpreterState->ts_counter``, which is incremented whenever a new thread state is created.
* The ``uint64_t ContextVar->version`` counter, which is incremented whenever the context variable value is changed in any logical context in any thread.
I'm pretty sure you need to also invalidate on context push/pop. Consider: def gen(): var.set("gen") var.lookup() # cache now holds "gen" yield print(var.lookup()) def main(): var.set("main") g = gen() next(g) # This should print "main", but it's the same thread and the last call to set() was # the one inside gen(), so we get the cached "gen" instead print(var.lookup()) var.set("no really main") var.lookup() # cache now holds "no really main" next(g) # should print "gen" but instead prints "no really main"
The cache is then implemented as follows::
class ContextVar:
def set(self, value): ... # implementation self.version += 1
def get(self):
I think you missed a s/get/lookup/ here :-) -n -- Nathaniel J. Smith -- https://vorpus.org
Hi, I'm aware that the current implementation is not final, but I already adapted the coroutine changes for Cython to allow for some initial integration testing with real external (i.e. non-Python coroutine) targets. I haven't adapted the tests yet, so the changes are currently unused and mostly untested. https://github.com/scoder/cython/tree/pep550_exec_context I also left some comments in the github commits along the way. Stefan
Hi, thanks, on the whole this is *much* easier to understand. I'll add some comments on the decimal examples. The thing is, decimal is already quite tricky and people do read PEPs long after they have been accepted, so they should probably reflect best practices. On Fri, Aug 25, 2017 at 06:32:22PM -0400, Yury Selivanov wrote:
Unfortunately, TLS does not work well for programs which execute concurrently in a single thread. A Python generator is the simplest example of a concurrent program. Consider the following::
def fractions(precision, x, y): with decimal.localcontext() as ctx: ctx.prec = precision yield Decimal(x) / Decimal(y) yield Decimal(x) / Decimal(y**2)
g1 = fractions(precision=2, x=1, y=3) g2 = fractions(precision=6, x=2, y=3)
items = list(zip(g1, g2))
The expected value of ``items`` is::
"Many people (wrongly) expect the values of ``items`` to be::" ;)
[(Decimal('0.33'), Decimal('0.666667')), (Decimal('0.11'), Decimal('0.222222'))]
Some languages, that support coroutines or generators, recommend passing the context manually as an argument to every function, see [1]_ for an example. This approach, however, has limited use for Python, where there is a large ecosystem that was built to work with a TLS-like context. Furthermore, libraries like ``decimal`` or ``numpy`` rely on context implicitly in overloaded operator implementations.
I'm not sure why this approach has limited use for decimal: from decimal import * def fractions(precision, x, y): ctx = Context(prec=precision) yield ctx.divide(Decimal(x), Decimal(y)) yield ctx.divide(Decimal(x), Decimal(y**2)) g1 = fractions(precision=2, x=1, y=3) g2 = fractions(precision=6, x=2, y=3) print(list(zip(g1, g2))) This is the first thing I'd do when writing async-safe code. Again, people do read PEPs. So if an asyncio programmer without any special knowledge of decimal reads the PEP, he probably assumes that localcontext() is currently the only option, while the safer and easy-to-reason-about context methods exist.
Now, let's revisit the decimal precision example from the `Rationale`_ section, and see how the execution context can improve the situation::
import decimal
decimal_prec = new_context_var() # create a new context variable
# Pre-PEP 550 Decimal relies on TLS for its context. # This subclass switches the decimal context storage # to the execution context for illustration purposes. # class MyDecimal(decimal.Decimal): def __init__(self, value="0"): prec = decimal_prec.lookup() if prec is None: raise ValueError('could not find decimal precision') context = decimal.Context(prec=prec) super().__init__(value, context=context)
As I understand it, the example creates a context with a custom precision and attempts to use that context to create a Decimal. This doesn't switch the actual decimal context. Secondly, the precision in the context argument to the Decimal() constructor has no effect --- the context there is only used for error handling. Lastly, if the constructor *did* use the precision, one would have to be careful about double rounding when using MyDecimal(). I get that this is supposed to be for illustration only, but please let's be careful about what people might take away from that code.
This generic caching approach is similar to what the current C implementation of ``decimal`` does to cache the the current decimal context, and has similar performance characteristics.
I think it'll work, but can we agree on hard numbers like max 2% slowdown for the non-threaded case and 4% for applications that only use threads? I'm a bit cautious because other C-extension state-managing PEPs didn't come close to these figures. Stefan Krah
On 26.08.2017 04:19, Ethan Furman wrote:
On 08/25/2017 03:32 PM, Yury Selivanov wrote:
A *context variable* is an object representing a value in the execution context. A new context variable is created by calling the ``new_context_var()`` function. A context variable object has two methods:
* ``lookup()``: returns the value of the variable in the current execution context;
* ``set()``: sets the value of the variable in the current execution context.
Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is well understood.
Why not the same interface as thread-local storage? This has been the question which bothered me from the beginning of PEP550. I don't understand what inventing a new way of access buys us here. Python features regular attribute access for years. It's even simpler than method-based access. Best, Sven
On Saturday, August 26, 2017 2:34:29 AM EDT Nathaniel Smith wrote:
On Fri, Aug 25, 2017 at 3:32 PM, Yury Selivanov
wrote: Coroutines and Asynchronous Tasks ---------------------------------
In coroutines, like in generators, context variable changes are local> and are not visible to the caller:: import asyncio
var = new_context_var()
async def sub(): assert var.lookup() == 'main' var.set('sub') assert var.lookup() == 'sub'
async def main(): var.set('main') await sub() assert var.lookup() == 'main'
loop = asyncio.get_event_loop() loop.run_until_complete(main())
I think this change is a bad idea. I think that generally, an async call like 'await async_sub()' should have the equivalent semantics to a synchronous call like 'sync_sub()', except for the part where the former is able to contain yields. Giving every coroutine an LC breaks that equivalence. It also makes it so in async code, you can't necessarily refactor by moving code in and out of subroutines. Like, if we inline 'sub' into 'main', that shouldn't change the semantics, but...
If we could easily, we'd given each _normal function_ its own logical context as well. What we are talking about here is variable scope leaking up the call stack. I think this is a bad pattern. For decimal context-like uses of the EC you should always use a context manager. For uses like Web request locals, you always have a top function that sets the context vars.
I think I see the motivation: you want to make
await sub()
and
await ensure_future(sub())
have the same semantics, right? And the latter has to create a Task
What we want is for `await sub()` to be equivalent to `await asyncio.wait_for(sub())` and to `await asyncio.gather(sub())`. Imagine we allow context var changes to leak out of `async def`. It's easy to write code that relies on this: async def init(): var.set('foo') async def main(): await init() assert var.lookup() == 'foo' If we change `await init()` to `await asyncio.wait_for(init())`, the code will break (and in real world, possibly very subtly).
It also adds non-trivial overhead, because now lookup() is O(depth of async callstack), instead of O(depth of (async) generator nesting), which is generally much smaller.
You would hit cache in lookup() most of the time. Elvis
On Sat, Aug 26, 2017 at 7:45 AM, Stefan Krah
Hi,
thanks, on the whole this is *much* easier to understand.
Thanks!
I'll add some comments on the decimal examples. The thing is, decimal is already quite tricky and people do read PEPs long after they have been accepted, so they should probably reflect best practices.
Agree. [..]
Some languages, that support coroutines or generators, recommend passing the context manually as an argument to every function, see [1]_ for an example. This approach, however, has limited use for Python, where there is a large ecosystem that was built to work with a TLS-like context. Furthermore, libraries like ``decimal`` or ``numpy`` rely on context implicitly in overloaded operator implementations.
I'm not sure why this approach has limited use for decimal:
from decimal import *
def fractions(precision, x, y): ctx = Context(prec=precision) yield ctx.divide(Decimal(x), Decimal(y)) yield ctx.divide(Decimal(x), Decimal(y**2))
g1 = fractions(precision=2, x=1, y=3) g2 = fractions(precision=6, x=2, y=3) print(list(zip(g1, g2)))
Because you have to know the limitations of implicit decimal context to make this choice. Most people don't (at least from my experience).
This is the first thing I'd do when writing async-safe code.
Because you know the decimal module very well :)
Again, people do read PEPs. So if an asyncio programmer without any special knowledge of decimal reads the PEP, he probably assumes that localcontext() is currently the only option, while the safer and easy-to-reason-about context methods exist.
I agree.
Now, let's revisit the decimal precision example from the `Rationale`_ section, and see how the execution context can improve the situation::
import decimal
decimal_prec = new_context_var() # create a new context variable
# Pre-PEP 550 Decimal relies on TLS for its context. # This subclass switches the decimal context storage # to the execution context for illustration purposes. # class MyDecimal(decimal.Decimal): def __init__(self, value="0"): prec = decimal_prec.lookup() if prec is None: raise ValueError('could not find decimal precision') context = decimal.Context(prec=prec) super().__init__(value, context=context)
As I understand it, the example creates a context with a custom precision and attempts to use that context to create a Decimal.
This doesn't switch the actual decimal context. Secondly, the precision in the context argument to the Decimal() constructor has no effect --- the context there is only used for error handling.
Lastly, if the constructor *did* use the precision, one would have to be careful about double rounding when using MyDecimal().
I get that this is supposed to be for illustration only, but please let's be careful about what people might take away from that code.
In the next iteration of the PEP we'll remove decimal examples and replace them with something with simpler semantics. This is clearly the best choice now.
This generic caching approach is similar to what the current C implementation of ``decimal`` does to cache the the current decimal context, and has similar performance characteristics.
I think it'll work, but can we agree on hard numbers like max 2% slowdown for the non-threaded case and 4% for applications that only use threads?
I'd be *very* surprised if wee see any noticeable slowdown at all. The way ContextVars will implement caching is very similar to the trick you use now. Yury
On Fri, Aug 25, 2017 at 10:19 PM, Ethan Furman
All in all, I like it. Nice job.
Thanks!
On 08/25/2017 03:32 PM, Yury Selivanov wrote:
A *context variable* is an object representing a value in the execution context. A new context variable is created by calling the ``new_context_var()`` function. A context variable object has two methods:
* ``lookup()``: returns the value of the variable in the current execution context;
* ``set()``: sets the value of the variable in the current execution context.
Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is well understood.
ContextVar.set(value) method writes the `value` to the *topmost LC*. ContextVar.lookup() method *traverses the stack* until it finds the LC that has a value. "get()" does not reflect this subtle semantics difference. Yury
On Sat, Aug 26, 2017 at 6:22 AM, Stefan Behnel
Hi,
I'm aware that the current implementation is not final, but I already adapted the coroutine changes for Cython to allow for some initial integration testing with real external (i.e. non-Python coroutine) targets. I haven't adapted the tests yet, so the changes are currently unused and mostly untested.
https://github.com/scoder/cython/tree/pep550_exec_context
I also left some comments in the github commits along the way.
Huge thanks for thinking about how this proposal will work for Cython and trying it out. Although I must warn you that the last reference implementation is very outdated, and the implementation we will end up with will be very different (think a total rewrite from scratch). Yury
I agree with David; this PEP has really gotten to a great place and the new organization makes it much easier to understand.
On Aug 25, 2017, at 22:19, Ethan Furman
wrote: Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is well understood.
I have the same question as Sven as to why we can’t have attribute access semantics. I probably asked that before, and you probably answered, so maybe if there’s a specific reason why this can’t be supported, the PEP should include a “rejected ideas” section explaining the choice. That said, if we have to use method lookup, then I agree that `.get()` is a better choice than `.lookup()`. But in that case, would it be possible to add an optional `default=None` argument so that you can specify a marker object for a missing value? I worry that None might be a valid value in some cases, but that currently can’t be distinguished from “missing”. I’d also like a debugging interface, such that I can ask “context_var.get()” and get some easy diagnostics about the resolution order. Cheers, -Barry
On Sat, Aug 26, 2017 at 9:33 AM, Sven R. Kunze
Why not the same interface as thread-local storage? This has been the question which bothered me from the beginning of PEP550. I don't understand what inventing a new way of access buys us here.
This was covered at length in these threads: https://mail.python.org/pipermail/python-ideas/2017-August/046888.html https://mail.python.org/pipermail/python-ideas/2017-August/046889.html I forgot to add a subsection to "Design Consideration" with a summary of that thread. Will be fixed in the next revision. Yury
On Sat, Aug 26, 2017 at 12:56 AM, David Mertz
This is now looking really good and I can understands it.
Great!
One question though. Sometimes creation of a context variable is done with a name argument, other times not. E.g.
var1 = new_context_var('var1') var = new_context_var()
We were very focused on making the High-level Specification as succinct as possible, omitting some API details that are not important for understanding the semantics. "name" argument is not optional and will be required. If it's optional, people will not provide it, making it very hard to introspect the context when we want it. I guess we'll just update the High-level Specification section to use the correct signature of "new_context_var". Yury
Would it be possible/desirable to make the default a unique string value
like a UUID or a stringified counter?
On Aug 26, 2017 9:35 AM, "Yury Selivanov"
This is now looking really good and I can understands it.
Great!
One question though. Sometimes creation of a context variable is done
with a
name argument, other times not. E.g.
var1 = new_context_var('var1') var = new_context_var()
We were very focused on making the High-level Specification as succinct as possible, omitting some API details that are not important for understanding the semantics. "name" argument is not optional and will be required. If it's optional, people will not provide it, making it very hard to introspect the context when we want it. I guess we'll just update the High-level Specification section to use the correct signature of "new_context_var". Yury
On Sat, Aug 26, 2017 at 2:34 AM, Nathaniel Smith
On Fri, Aug 25, 2017 at 3:32 PM, Yury Selivanov
wrote: Coroutines and Asynchronous Tasks ---------------------------------
In coroutines, like in generators, context variable changes are local and are not visible to the caller::
import asyncio
var = new_context_var()
async def sub(): assert var.lookup() == 'main' var.set('sub') assert var.lookup() == 'sub'
async def main(): var.set('main') await sub() assert var.lookup() == 'main'
loop = asyncio.get_event_loop() loop.run_until_complete(main())
I think this change is a bad idea. I think that generally, an async call like 'await async_sub()' should have the equivalent semantics to a synchronous call like 'sync_sub()', except for the part where the former is able to contain yields.
That exception is why the semantics cannot be equivalent.
Giving every coroutine an LC breaks that equivalence. It also makes it so in async code, you can't necessarily refactor by moving code in and out of subroutines.
I'll cover the refactoring argument later in this email. [..]
It also adds non-trivial overhead, because now lookup() is O(depth of async callstack), instead of O(depth of (async) generator nesting), which is generally much smaller.
I don't think it's non-trivial though: First, we have a cache in ContextVar which makes lookup O(1) for any tight code that uses libraries like decimal and numpy. Second, most of the LCs in the chain will be empty, so even the uncached lookup will still be fast. Third, you will usually have your "with my_context()" block right around your code (or within a few awaits distance), otherwise it will be hard to reason what's the context. And if, occasionally, you have a one single "var.lookup()" call that won't be cached, the cost of it will still be measured in microseconds. Finally, the easy to follow semantics is the main argument for the change (even at the cost of making "get()" a bit slower in corner cases).
I think I see the motivation: you want to make
await sub()
and
await ensure_future(sub())
have the same semantics, right?
Yes.
And the latter has to create a Task and split it off into a new execution context, so you want the former to do so as well? But to me this is like saying that we want
sync_sub()
and
thread_pool_executor.submit(sync_sub).result()
This example is very similar to: await sub() and await create_task(sub()) So it's really about making the semantics for coroutines be predictable.
(And fwiw I'm still not convinced we should give up on 'yield from' as a mechanism for refactoring generators.)
I don't get this "refactoring generators" and "refactoring coroutines" argument. Suppose you have this code: def gen(): i = 0 for _ in range(3): i += 1 yield i for _ in range(5): i += 1 yield i You can't refactor gen() by simply copying/pasting parts of its body into a separate generator: def count3(): for _ in range(3): i += 1 yield def gen(): i = 0 yield from count3() for _ in range(5): i += 1 yield i The above won't work for some obvious reasons: 'i' is a nonlocal variable for 'count3' block of code. Almost exactly the same thing will happen with the current PEP 550 specification, which is a *good* thing. 'yield from' and 'await' are not about refactoring. They can be used for splitting large generators/coroutines into a set of smaller ones, sure. But there's *no* magical, always working, refactoring mechanism that allows to do that blindly.
To establish the full semantics of execution context in couroutines, we must also consider *tasks*. A task is the abstraction used by *asyncio*, and other similar libraries, to manage the concurrent execution of coroutines. In the example above, a task is created implicitly by the ``run_until_complete()`` function. ``asyncio.wait_for()`` is another example of implicit task creation::
async def sub(): await asyncio.sleep(1) assert var.lookup() == 'main'
async def main(): var.set('main')
# waiting for sub() directly await sub()
# waiting for sub() with a timeout await asyncio.wait_for(sub(), timeout=2)
var.set('main changed')
Intuitively, we expect the assertion in ``sub()`` to hold true in both invocations, even though the ``wait_for()`` implementation actually spawns a task, which runs ``sub()`` concurrently with ``main()``.
I found this example confusing -- you talk about sub() and main() running concurrently, but ``wait_for`` blocks main() until sub() has finished running, right?
Right. Before we continue, let me make sure we are on the same page here: await asyncio.wait_for(sub(), timeout=2) can be refactored into: task = asyncio.wait_for(sub(), timeout=2) # sub() is scheduled now, and a "loop.call_soon" call has been # made to advance it soon. await task Now, if we look at the following example (1): async def foo(): await bar() The "bar()" coroutine will execute within "foo()". If we add a timeout logic (2): async def foo(): await wait_for(bar() ,1) The "bar()" coroutine will execute outside of "foo()", and "foo()" will only wait for the result of that execution. Now, Async Tasks capture the context when they are created -- that's the only sane option they have. If coroutines don't have their own LC, "bar()" in examples (1) and (2) would interact with the execution context differently! And this is something that we can't let happen, as it would force asyncio users to think about the EC every time they want to wrap a coroutine into a task. [..]
The ``sys.run_with_logical_context()`` function performs the following steps:
1. Push *lc* onto the current execution context stack. 2. Run ``func(*args, **kwargs)``. 3. Pop *lc* from the execution context stack. 4. Return or raise the ``func()`` result.
It occurs to me that both this and the way generator/coroutines expose their logic context means that logical context objects are semantically mutable. This could create weird effects if someone attaches the same LC to two different generators, or tries to use it simultaneously in two different threads, etc. We should have a little interlock like generator's ag_running, where an LC keeps track of whether it's currently in use and if you try to push the same LC onto two ECs simultaneously then it errors out.
Correct. Both LC (and EC) objects will be both wrapped into "shell" objects before being exposed to the end user. run_with_logical_context() will mutate the user-visible LC object (keeping the underlying LC immutable, of course). Ideally, we would want run_with_logical_context to have the following signature: result, updated_lc = run_with_logical_context(lc, callable) But because "callable" can raise an exception this would not work.
For efficient access in performance-sensitive code paths, such as in ``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``, making it an O(1) operation when the cache is hit. The cache key is composed from the following:
* The new ``uint64_t PyThreadState->unique_id``, which is a globally unique thread state identifier. It is computed from the new ``uint64_t PyInterpreterState->ts_counter``, which is incremented whenever a new thread state is created.
* The ``uint64_t ContextVar->version`` counter, which is incremented whenever the context variable value is changed in any logical context in any thread.
I'm pretty sure you need to also invalidate on context push/pop. Consider:
def gen(): var.set("gen") var.lookup() # cache now holds "gen" yield print(var.lookup())
def main(): var.set("main") g = gen() next(g) # This should print "main", but it's the same thread and the last call to set() was # the one inside gen(), so we get the cached "gen" instead print(var.lookup()) var.set("no really main") var.lookup() # cache now holds "no really main" next(g) # should print "gen" but instead prints "no really main"
Yeah, you're right. Thanks!
The cache is then implemented as follows::
class ContextVar:
def set(self, value): ... # implementation self.version += 1
def get(self):
I think you missed a s/get/lookup/ here :-)
Fixed! Yury
On Sat, Aug 26, 2017 at 1:10 PM, David Mertz
Would it be possible/desirable to make the default a unique string value like a UUID or a stringified counter?
Sure, or we could just use the id of ContextVar. In the end, when we want to introspect the EC while debugging, we would see something like this: { ContextVar(name='518CDD4F-D676-408F-B968-E144F792D055'): 42, ContextVar(name='decimal_context'): DecimalContext(precision=2), ContextVar(name='7A44D3BE-F7A1-40B7-BE51-7DFFA7E0E02F'): 'spam' } That's why I think it's easier to force users always specify the name: my_var = sys.new_context_var('my_var') This is similar to namedtuples, and nobody really complains about them. Yury
On Sat, Aug 26, 2017 at 1:23 PM, Ethan Furman
On 08/26/2017 09:25 AM, Yury Selivanov wrote:
On Fri, Aug 25, 2017 at 10:19 PM, Ethan Furman wrote:
A *context variable* is an object representing a value in the execution context. A new context variable is created by calling the ``new_context_var()`` function. A context variable object has two methods:
* ``lookup()``: returns the value of the variable in the current execution context;
* ``set()``: sets the value of the variable in the current execution context.
Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is well understood.
ContextVar.set(value) method writes the `value` to the *topmost LC*.
ContextVar.lookup() method *traverses the stack* until it finds the LC that has a value. "get()" does not reflect this subtle semantics difference.
A good point; however, ChainMap, which behaves similarly as far as lookup goes, uses "get" and does not have a "lookup" method. I think we lose more than we gain by changing that method name.
ChainMap is constrained to be a Mapping-like object, but I get your point. Let's see what others say about the "lookup()". It is kind of an experiment to try a name and see if it fits. Yury
On 08/26/2017 09:25 AM, Yury Selivanov wrote:
On Fri, Aug 25, 2017 at 10:19 PM, Ethan Furman wrote:
A *context variable* is an object representing a value in the execution context. A new context variable is created by calling the ``new_context_var()`` function. A context variable object has two methods:
* ``lookup()``: returns the value of the variable in the current execution context;
* ``set()``: sets the value of the variable in the current execution context.
Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is well understood.
ContextVar.set(value) method writes the `value` to the *topmost LC*.
ContextVar.lookup() method *traverses the stack* until it finds the LC that has a value. "get()" does not reflect this subtle semantics difference.
A good point; however, ChainMap, which behaves similarly as far as lookup goes, uses "get" and does not have a "lookup" method. I think we lose more than we gain by changing that method name. -- ~Ethan~
Thanks for the update. Comments in-line below.
-eric
On Fri, Aug 25, 2017 at 4:32 PM, Yury Selivanov
[snip]
This PEP adds a new generic mechanism of ensuring consistent access to non-local state in the context of out-of-order execution, such as in Python generators and coroutines.
Thread-local storage, such as ``threading.local()``, is inadequate for programs that execute concurrently in the same OS thread. This PEP proposes a solution to this problem.
[snip]
In regular, single-threaded code that doesn't involve generators or coroutines, context variables behave like globals::
[snip]
In multithreaded code, context variables behave like thread locals::
[snip]
In generators, changes to context variables are local and are not visible to the caller, but are visible to the code called by the generator. Once set in the generator, the context variable is guaranteed not to change between iterations::
With threads we have a directed graph of execution, rooted at the root thread, branching with each new thread and merging with each .join(). Each thread gets its own copy of each threading.local, regardless of the relationship between branches (threads) in the execution graph. With async (and generators) we also have a directed graph of execution, rooted in the calling thread, branching with each new async call. Currently there is no equivalent to threading.local for the async execution graph. This proposal involves adding such an equivalent. However, the proposed solution isn''t quite equivalent, right? It adds a concept of lookup on the chain of namespaces, traversing up the execution graph back to the root. threading.local does not do this. Furthermore, you can have more than one threading.local per thread.
From what I read in the PEP, each node in the execution graph has (at most) one Execution Context.
The PEP doesn't really say much about these differences from threadlocals, including a rationale. FWIW, I think such a COW mechanism could be useful. However, it does add complexity to the feature. So a clear explanation in the PEP of why it's worth it would be valuable.
[snip]
The following new Python APIs are introduced by this PEP:
1. The ``sys.new_context_var(name: str='...')`` function to create ``ContextVar`` objects.
2. The ``ContextVar`` object, which has:
* the read-only ``.name`` attribute, * the ``.lookup()`` method which returns the value of the variable in the current execution context; * the ``.set()`` method which sets the value of the variable in the current execution context.
3. The ``sys.get_execution_context()`` function, which returns a copy of the current execution context.
4. The ``sys.new_execution_context()`` function, which returns a new empty execution context.
5. The ``sys.new_logical_context()`` function, which returns a new empty logical context.
6. The ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs)`` function, which runs *func* with the provided execution context.
7. The ``sys.run_with_logical_context(lc:LogicalContext, func, *args, **kwargs)`` function, which runs *func* with the provided logical context on top of the current execution context.
#1-4 are consistent with a single EC per Python thread. However, #5-7 imply that more than one EC per thread is supported, but only one is active in the current execution stack (notably the EC is rooted at the calling frame). threading.local provides a much simpler mechanism but does not support the chained context (COW) semantics...
On Sat, Aug 26, 2017 at 11:19 AM, Yury Selivanov
This is similar to namedtuples, and nobody really complains about them.
FWIW, there are plenty of complaints on python-ideas about this (and never a satisfactory solution). :) That said, I don't think it is as big a deal here since the target audience is much smaller. -eric
Hi Eric,
On Sat, Aug 26, 2017 at 1:25 PM, Eric Snow
Thanks for the update. Comments in-line below.
-eric
On Fri, Aug 25, 2017 at 4:32 PM, Yury Selivanov
wrote: [snip]
This PEP adds a new generic mechanism of ensuring consistent access to non-local state in the context of out-of-order execution, such as in Python generators and coroutines.
Thread-local storage, such as ``threading.local()``, is inadequate for programs that execute concurrently in the same OS thread. This PEP proposes a solution to this problem.
[snip]
In regular, single-threaded code that doesn't involve generators or coroutines, context variables behave like globals::
[snip]
In multithreaded code, context variables behave like thread locals::
[snip]
In generators, changes to context variables are local and are not visible to the caller, but are visible to the code called by the generator. Once set in the generator, the context variable is guaranteed not to change between iterations::
With threads we have a directed graph of execution, rooted at the root thread, branching with each new thread and merging with each .join(). Each thread gets its own copy of each threading.local, regardless of the relationship between branches (threads) in the execution graph.
With async (and generators) we also have a directed graph of execution, rooted in the calling thread, branching with each new async call. Currently there is no equivalent to threading.local for the async execution graph. This proposal involves adding such an equivalent.
Correct.
However, the proposed solution isn''t quite equivalent, right? It adds a concept of lookup on the chain of namespaces, traversing up the execution graph back to the root. threading.local does not do this. Furthermore, you can have more than one threading.local per thread. From what I read in the PEP, each node in the execution graph has (at most) one Execution Context.
The PEP doesn't really say much about these differences from threadlocals, including a rationale. FWIW, I think such a COW mechanism could be useful. However, it does add complexity to the feature. So a clear explanation in the PEP of why it's worth it would be valuable.
Currently, the PEP covers the proposed mechanism in-depth, explaining why every detail of the spec is the way it is. But I think it'd be valuable to highlight differences from theading.local() in a separate section. We'll think about adding one. Yury
On Sat, Aug 26, 2017 at 12:30 PM, Barry Warsaw
I agree with David; this PEP has really gotten to a great place and the new organization makes it much easier to understand.
On Aug 25, 2017, at 22:19, Ethan Furman
wrote: Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is well understood.
I have the same question as Sven as to why we can’t have attribute access semantics. I probably asked that before, and you probably answered, so maybe if there’s a specific reason why this can’t be supported, the PEP should include a “rejected ideas” section explaining the choice.
Elvis just added it: https://www.python.org/dev/peps/pep-0550/#replication-of-threading-local-int...
That said, if we have to use method lookup, then I agree that `.get()` is a better choice than `.lookup()`. But in that case, would it be possible to add an optional `default=None` argument so that you can specify a marker object for a missing value? I worry that None might be a valid value in some cases, but that currently can’t be distinguished from “missing”.
Nathaniel has a use case where he needs to know if the value is in the topmost LC or not. One way to address that need is to have the following signature for lookup(): lookup(*, default=None, traverse=True) IMO "lookup" is a slightly better name in this particular context. Yury
On Aug 26, 2017, at 14:15, Yury Selivanov
Elvis just added it: https://www.python.org/dev/peps/pep-0550/#replication-of-threading-local-int...
Thanks, that’s exactly what I was looking for. Great summary of the issue.
That said, if we have to use method lookup, then I agree that `.get()` is a better choice than `.lookup()`. But in that case, would it be possible to add an optional `default=None` argument so that you can specify a marker object for a missing value? I worry that None might be a valid value in some cases, but that currently can’t be distinguished from “missing”.
Nathaniel has a use case where he needs to know if the value is in the topmost LC or not.
One way to address that need is to have the following signature for lookup():
lookup(*, default=None, traverse=True)
IMO "lookup" is a slightly better name in this particular context.
Given that signature (which +1), I agree. You could add keywords for debugging lookup fairly easily too. Cheers, -Barry
On 26.08.2017 19:23, Yury Selivanov wrote:
ChainMap is constrained to be a Mapping-like object, but I get your point. Let's see what others say about the "lookup()". It is kind of an experiment to try a name and see if it fits.
I like "get" more. ;-) Best, Sven PS: This might be a result of still leaning towards attribute access despite the discussion you referenced. I still don't think complicating and reinventing terminology (which basically results in API names) buys us much. And I am still with Ethan, a context stack is just a ChainMap. Renaming basic methods won't hide that fact. That's my only criticism of the PEP. The rest is fine and useful.
I'm convinced by the new section explaining why a single value is better
than a namespace. Nonetheless, it would feel more "Pythonic" to me to
create a property `ContextVariable.val` whose getter and setter was
`.lookup()` and `.set()` (or maybe `._lookup()` and `._set()`).
Lookup might require a more complex call signature in rare cases, but the
large majority of the time it would simply be `var.val`, and that should be
the preferred API IMO. That provides a nice parallel between `var.name`
and `var.val` also.
On Sat, Aug 26, 2017 at 11:22 AM, Barry Warsaw
On Aug 26, 2017, at 14:15, Yury Selivanov
wrote: Elvis just added it: https://www.python.org/dev/peps/pep-0550/#replication-of-
threading-local-interface
Thanks, that’s exactly what I was looking for. Great summary of the issue.
That said, if we have to use method lookup, then I agree that `.get()`
is a better choice than `.lookup()`. But in that case, would it be possible to add an optional `default=None` argument so that you can specify a marker object for a missing value? I worry that None might be a valid value in some cases, but that currently can’t be distinguished from “missing”.
Nathaniel has a use case where he needs to know if the value is in the topmost LC or not.
One way to address that need is to have the following signature for
lookup():
lookup(*, default=None, traverse=True)
IMO "lookup" is a slightly better name in this particular context.
Given that signature (which +1), I agree. You could add keywords for debugging lookup fairly easily too.
Cheers, -Barry
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ mertz%40gnosis.cx
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
Hi,
For the purpose of this section, we define *execution context* as an opaque container of non-local state that allows consistent access to its contents in the concurrent execution environment.
May be nonsense/non-practical :-), but : - How extrapolates this concept to the whole interpreter (container of ...non-local state --> interpreter state; concurrent execution environment --> python interpreter, ...) ? Thank in advance ! -- francis
On Sat, 26 Aug 2017 13:23:00 -0400
Yury Selivanov
A good point; however, ChainMap, which behaves similarly as far as lookup goes, uses "get" and does not have a "lookup" method. I think we lose more than we gain by changing that method name.
ChainMap is constrained to be a Mapping-like object, but I get your point. Let's see what others say about the "lookup()". It is kind of an experiment to try a name and see if it fits.
I like "get()" better than "lookup()". Regards Antoine.
Hi,
Multithreaded Code ------------------
In multithreaded code, context variables behave like thread locals::
var = new_context_var()
def sub(): assert var.lookup() is None # The execution context is empty # for each new thread. var.set('sub')
def main(): var.set('main')
thread = threading.Thread(target=sub) thread.start() thread.join()
assert var.lookup() == 'main'
it's by design that the execution context for new threads to be empty or should it be possible to set it to some initial value? Like e.g: var = new_context_var('init') def sub(): assert var.lookup() == 'init' var.set('sub') def main(): var.set('main') thread = threading.Thread(target=sub) thread.start() thread.join() assert var.lookup() == 'main' Thanks, --francis
On Sat, Aug 26, 2017 at 10:25 AM, Eric Snow
With threads we have a directed graph of execution, rooted at the root thread, branching with each new thread and merging with each .join(). Each thread gets its own copy of each threading.local, regardless of the relationship between branches (threads) in the execution graph.
With async (and generators) we also have a directed graph of execution, rooted in the calling thread, branching with each new async call. Currently there is no equivalent to threading.local for the async execution graph. This proposal involves adding such an equivalent.
However, the proposed solution isn''t quite equivalent, right? It adds a concept of lookup on the chain of namespaces, traversing up the execution graph back to the root. threading.local does not do this. Furthermore, you can have more than one threading.local per thread. From what I read in the PEP, each node in the execution graph has (at most) one Execution Context.
The PEP doesn't really say much about these differences from threadlocals, including a rationale. FWIW, I think such a COW mechanism could be useful. However, it does add complexity to the feature. So a clear explanation in the PEP of why it's worth it would be valuable.
You might be interested in these notes I wrote to motivate why we need a chain of namespaces, and why simple "async task locals" aren't sufficient: https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb They might be a bit verbose to include directly in the PEP, but Yury/Elvis, feel free to steal whatever if you think it'd be useful. -n -- Nathaniel J. Smith -- https://vorpus.org
On 08/26/2017 12:12 PM, David Mertz wrote:
I'm convinced by the new section explaining why a single value is better than a namespace. Nonetheless, it would feel more "Pythonic" to me to create a property `ContextVariable.val` whose getter and setter was `.lookup()` and `.set()` (or maybe `._lookup()` and `._set()`).
Lookup might require a more complex call signature in rare cases, but the large majority of the time it would simply be `var.val`, and that should be the preferred API IMO. That provides a nice parallel between `var.name http://var.name` and `var.val` also.
+1 to the property solution. -- ~Ethan~
On Sat, Aug 26, 2017 at 7:58 AM, Elvis Pranskevichus
On Saturday, August 26, 2017 2:34:29 AM EDT Nathaniel Smith wrote:
On Fri, Aug 25, 2017 at 3:32 PM, Yury Selivanov
wrote: Coroutines and Asynchronous Tasks ---------------------------------
In coroutines, like in generators, context variable changes are local> and are not visible to the caller:: import asyncio
var = new_context_var()
async def sub(): assert var.lookup() == 'main' var.set('sub') assert var.lookup() == 'sub'
async def main(): var.set('main') await sub() assert var.lookup() == 'main'
loop = asyncio.get_event_loop() loop.run_until_complete(main())
I think this change is a bad idea. I think that generally, an async call like 'await async_sub()' should have the equivalent semantics to a synchronous call like 'sync_sub()', except for the part where the former is able to contain yields. Giving every coroutine an LC breaks that equivalence. It also makes it so in async code, you can't necessarily refactor by moving code in and out of subroutines. Like, if we inline 'sub' into 'main', that shouldn't change the semantics, but...
If we could easily, we'd given each _normal function_ its own logical context as well.
I mean... you could do that. It'd be easy to do technically, right? But it would make the PEP useless, because then projects like decimal and numpy couldn't adopt it without breaking backcompat, meaning they couldn't adopt it at all. The backcompat argument isn't there in the same way for async code, because it's new and these functions have generally been broken there anyway. But it's still kinda there in spirit: there's a huge amount of collective knowledge about how (synchronous) Python code works, and IMO async code should match that whenever possible.
What we are talking about here is variable scope leaking up the call stack. I think this is a bad pattern. For decimal context-like uses of the EC you should always use a context manager. For uses like Web request locals, you always have a top function that sets the context vars.
It's perfectly reasonable to have a script where you call decimal.setcontext or np.seterr somewhere at the top to set the defaults for the rest of the script. Yeah, maybe it'd be a bit cleaner to use a 'with' block wrapped around main(), and certainly in a complex app you want to stick to that, but Python isn't just used for complex apps :-). I foresee confused users trying to figure out why np.seterr suddenly stopped working when they ported their app to use async. This also seems like it makes some cases much trickier. Like, say you have an async context manager that wants to manipulate a context local. If you write 'async def __aenter__', you just lost -- it'll be isolated. I think you have to write some awkward thing like: def __aenter__(self): coro = self._real_aenter() coro.__logical_context__ = None return coro It would be really nice if libraries like urllib3/requests supported async as an option, but it's difficult because they can't drop support for synchronous operation and python 2, and we want to keep a single codebase. One option I've been exploring is to write them in "synchronous style" but with async/await keywords added, and then generating a py2-compatible version with a script that strips out async/await etc. (Like a really simple 3to2 that just works at the token level.) One transformation you'd want to apply is replacing __aenter__ -> __enter__, but this gets much more difficult if we have to worry about elaborate transformations like the above... If I have an async generator, and I set its __logical_context__ to None, then do I also have to set this attribute on every coroutine returned from calling __anext__/asend/athrow/aclose?
I think I see the motivation: you want to make
await sub()
and
await ensure_future(sub())
have the same semantics, right? And the latter has to create a Task
What we want is for `await sub()` to be equivalent to `await asyncio.wait_for(sub())` and to `await asyncio.gather(sub())`.
I don't feel like there's any need to make gather() have exactly the same semantics as a regular call -- it's pretty clearly a task-spawning primitive that runs all of the given coroutines concurrently, so it makes sense that it would have task-spawning semantics rather than call semantics. wait_for is a more unfortunate case; there's really no reason for it to create a Task at all, except that asyncio made the decision to couple cancellation and Tasks, so if you want one then you're stuck with the other. Yury's made some comments about stealing Trio's cancellation system and adding it to asyncio -- I don't know how serious he was. If he did then it would let you use timeouts without creating a new Task, and this problem would go away. OTOH if you stick with pushing a new LC on every coroutine call, then that makes Trio's cancellation system way slower, because it has to walk the whole stack of LCs on every yield to register/unregister each cancel scope. PEP 550v4 makes that stack much deeper, plus breaks the optimization I was planning to use to let us mostly skip this entirely. (To be clear, this isn't the main reason I think these semantics are a bad idea -- the main reason is that I think async and sync code should have the same semantics. But it definitely doesn't help that it creates obstacles to improving asyncio/improving on asyncio.)
Imagine we allow context var changes to leak out of `async def`. It's easy to write code that relies on this:
async def init(): var.set('foo')
async def main(): await init() assert var.lookup() == 'foo'
If we change `await init()` to `await asyncio.wait_for(init())`, the code will break (and in real world, possibly very subtly).
But instead you're making it so that it will break if the user adds/removes async/await keywords: def init(): var.set('foo') def main(): init()
It also adds non-trivial overhead, because now lookup() is O(depth of async callstack), instead of O(depth of (async) generator nesting), which is generally much smaller.
You would hit cache in lookup() most of the time.
You've just reduced the cache hit rate too, because the cache gets invalidated on every push/pop. Presumably you'd optimize this to skip invalidating if the LC that gets pushed/popped is empty, so this isn't as catastrophic as it might initially look, but you still have to invalidate all the cached variables every time any variable gets touched and then you return from a function. Which might happen quite a bit if, for example, using timeouts involves touching the LC :-). -n -- Nathaniel J. Smith -- https://vorpus.org
On Sat, Aug 26, 2017 at 04:13:24PM -0700, Nathaniel Smith wrote:
On Sat, Aug 26, 2017 at 7:58 AM, Elvis Pranskevichus
wrote: What we are talking about here is variable scope leaking up the call stack. I think this is a bad pattern. For decimal context-like uses of the EC you should always use a context manager. For uses like Web request locals, you always have a top function that sets the context vars.
It's perfectly reasonable to have a script where you call decimal.setcontext or np.seterr somewhere at the top to set the defaults for the rest of the script.
+100. The only thing that makes sense for decimal is to change localcontext() to be automatically async-safe while preserving the rest of the semantics. Stefan Krah
On Sat, Aug 26, 2017 at 12:21:44PM -0400, Yury Selivanov wrote:
On Sat, Aug 26, 2017 at 7:45 AM, Stefan Krah
wrote: This generic caching approach is similar to what the current C implementation of ``decimal`` does to cache the the current decimal context, and has similar performance characteristics.
I think it'll work, but can we agree on hard numbers like max 2% slowdown for the non-threaded case and 4% for applications that only use threads?
I'd be *very* surprised if wee see any noticeable slowdown at all. The way ContextVars will implement caching is very similar to the trick you use now.
I'd also be surprised, but what do we do if the PEP is accepted and for some yet unknown reason the implementation turns out to be 12-15% slower? The slowdown related to the module-state/heap-type PEPs wasn't immediately obvious either; it would be nice to have actual figures before the PEP is accepted. Stefan Krah
On Sun, Aug 27, 2017 at 6:08 AM, Stefan Krah
On Sat, Aug 26, 2017 at 04:13:24PM -0700, Nathaniel Smith wrote:
On Sat, Aug 26, 2017 at 7:58 AM, Elvis Pranskevichus
wrote: What we are talking about here is variable scope leaking up the call stack. I think this is a bad pattern. For decimal context-like uses of the EC you should always use a context manager. For uses like Web request locals, you always have a top function that sets the context vars.
It's perfectly reasonable to have a script where you call decimal.setcontext or np.seterr somewhere at the top to set the defaults for the rest of the script.
+100. The only thing that makes sense for decimal is to change localcontext() to be automatically async-safe while preserving the rest of the semantics.
TBH Nathaniel's argument isn't entirely correct. With the semantics defined in PEP 550 v4, you still can set decimal context on top of your file, in your async functions etc. This will work: decimal.setcontext(ctx) def foo(): # use decimal with context=ctx and this: def foo(): decimal.setcontext(ctx) # use decimal with context=ctx and this: def bar(): # use decimal with context=ctx def foo(): decimal.setcontext(ctx) bar() and this: def bar(): decimal.setcontext(ctx) def foo(): bar() # use decimal with context=ctx and this: decimal.setcontext(ctx) async def foo(): # use decimal with context=ctx and this: async def bar(): # use decimal with context=ctx async def foo(): decimal.setcontext(ctx) await bar() The only thing that will not work, is this (ex1): async def bar(): decimal.setcontext(ctx) async def foo(): await bar() # use decimal with context=ctx The reason why this one example worked in PEP 550 v3 and doesn't work in v4 is that we want to avoid random code breakage if you wrap your coroutine in a task, like here (ex2): async def bar(): decimal.setcontext(ctx) async def foo(): await wait_for(bar(), 1) # use decimal with context=ctx We want (ex1) and (ex2) to work the same way always. That's the only difference in semantics between v3 and v4, and it's the only sane one, because implicit task creation is an extremely subtle detail that most users aren't aware of. We can't have a semantics that let you easily break your code by adding a timeout in one await. Speaking of (ex1), there's an example that didn't work in any PEP 550 version: def bar(): decimal.setcontext(ctx) yield async def foo(): list(bar()) # use decimal with context=ctx In the above code, bar() generator sets some decimal context, and it will not leak outside of it. This semantics is one of PEP 550 goals. The last change just unifies this semantics for coroutines, generators, and asynchronous generators, which is a good thing. Yury
On Sun, Aug 27, 2017 at 11:19:20AM -0400, Yury Selivanov wrote:
On Sun, Aug 27, 2017 at 6:08 AM, Stefan Krah
wrote: On Sat, Aug 26, 2017 at 04:13:24PM -0700, Nathaniel Smith wrote:
It's perfectly reasonable to have a script where you call decimal.setcontext or np.seterr somewhere at the top to set the defaults for the rest of the script.
+100. The only thing that makes sense for decimal is to change localcontext() to be automatically async-safe while preserving the rest of the semantics.
TBH Nathaniel's argument isn't entirely correct.
With the semantics defined in PEP 550 v4, you still can set decimal context on top of your file, in your async functions etc.
and this:
def bar(): decimal.setcontext(ctx)
def foo(): bar() # use decimal with context=ctx
Okay, so if I understand this correctly we actually will not have dynamic scoping for regular functions: bar() has returned, so the new context would not be found on the stack with proper dynamic scoping.
and this:
async def bar(): # use decimal with context=ctx
async def foo(): decimal.setcontext(ctx) await bar()
The only thing that will not work, is this (ex1):
async def bar(): decimal.setcontext(ctx)
async def foo(): await bar() # use decimal with context=ctx
Here we do have dynamic scoping.
Speaking of (ex1), there's an example that didn't work in any PEP 550 version:
def bar(): decimal.setcontext(ctx) yield
async def foo(): list(bar()) # use decimal with context=ctx
What about this? async def bar(): setcontext(Context(prec=1)) for i in range(10): await asyncio.sleep(1) yield i async def foo(): async for i in bar(): # ctx.prec=1? print(Decimal(100) / 3) I'm searching for some abstract model to reason about the scopes. Stefan Krah
On Mon, Aug 28, 2017 at 7:19 AM, Stefan Krah
On Sun, Aug 27, 2017 at 11:19:20AM -0400, Yury Selivanov wrote:
On Sun, Aug 27, 2017 at 6:08 AM, Stefan Krah
wrote: On Sat, Aug 26, 2017 at 04:13:24PM -0700, Nathaniel Smith wrote:
It's perfectly reasonable to have a script where you call decimal.setcontext or np.seterr somewhere at the top to set the defaults for the rest of the script.
+100. The only thing that makes sense for decimal is to change localcontext() to be automatically async-safe while preserving the rest of the semantics.
TBH Nathaniel's argument isn't entirely correct.
With the semantics defined in PEP 550 v4, you still can set decimal context on top of your file, in your async functions etc.
and this:
def bar(): decimal.setcontext(ctx)
def foo(): bar() # use decimal with context=ctx
Okay, so if I understand this correctly we actually will not have dynamic scoping for regular functions: bar() has returned, so the new context would not be found on the stack with proper dynamic scoping.
Correct. Although I would avoid associating PEP 550 with dynamic scoping entirely, as we never intended to implement it. [..]
What about this?
async def bar(): setcontext(Context(prec=1)) for i in range(10): await asyncio.sleep(1) yield i
async def foo(): async for i in bar(): # ctx.prec=1? print(Decimal(100) / 3)
I'm searching for some abstract model to reason about the scopes.
Whatever is set in coroutines, generators, and async generators does not leak out. In the above example, "prec=1" will only be set inside "bar()", and "foo()" will not see that. (Same will happen for a regular function and a generator). Yury
On Sat, Aug 26, 2017 at 4:45 PM, francismb
it's by design that the execution context for new threads to be empty or should it be possible to set it to some initial value? Like e.g:
var = new_context_var('init')
def sub(): assert var.lookup() == 'init' var.set('sub')
def main(): var.set('main')
thread = threading.Thread(target=sub) thread.start() thread.join()
assert var.lookup() == 'main'
Yes, it's by design. With PEP 550 APIs it's easy to subclass threading.Thread or concurrent.futures.ThreadPool to make them capture the EC. Yury
On 08/28/2017 04:19 AM, Stefan Krah wrote:
What about this?
async def bar(): setcontext(Context(prec=1)) for i in range(10): await asyncio.sleep(1) yield i
async def foo(): async for i in bar(): # ctx.prec=1? print(Decimal(100) / 3)
If I understand correctly, ctx.prec is whatever the default is, because foo comes before bar on the stack, and after the current value for i is grabbed bar is no longer executing, and therefore no longer on the stack. I hope I'm right. ;) -- ~Ethan~
On Mon, Aug 28, 2017 at 11:26 AM, Ethan Furman
On 08/28/2017 04:19 AM, Stefan Krah wrote:
What about this?
async def bar(): setcontext(Context(prec=1)) for i in range(10): await asyncio.sleep(1) yield i
async def foo(): async for i in bar(): # ctx.prec=1? print(Decimal(100) / 3)
If I understand correctly, ctx.prec is whatever the default is, because foo comes before bar on the stack, and after the current value for i is grabbed bar is no longer executing, and therefore no longer on the stack. I hope I'm right. ;)
You're right! Yury
On Mon, Aug 28, 2017 at 11:23:12AM -0400, Yury Selivanov wrote:
On Mon, Aug 28, 2017 at 7:19 AM, Stefan Krah
wrote: Okay, so if I understand this correctly we actually will not have dynamic scoping for regular functions: bar() has returned, so the new context would not be found on the stack with proper dynamic scoping.
Correct. Although I would avoid associating PEP 550 with dynamic scoping entirely, as we never intended to implement it.
Good, I agree it does not make sense.
[..]
What about this?
async def bar(): setcontext(Context(prec=1)) for i in range(10): await asyncio.sleep(1) yield i
async def foo(): async for i in bar(): # ctx.prec=1? print(Decimal(100) / 3)
I'm searching for some abstract model to reason about the scopes.
Whatever is set in coroutines, generators, and async generators does not leak out. In the above example, "prec=1" will only be set inside "bar()", and "foo()" will not see that. (Same will happen for a regular function and a generator).
But the state "leaks in" as per your previous example: async def bar(): # use decimal with context=ctx async def foo(): decimal.setcontext(ctx) await bar() IMHO it shouldn't with coroutine-local-storage (let's call it CLS). So, as I see it, there's still some mixture between dynamic scoping and CLS because it this example bar() is allowed to search the stack. Stefan Krah
A question appeared here about a simple mental model for PEP 550.
It looks much clearer now, than in the first version, but I still would
like to clarify: can one say that PEP 550 just provides more fine-grained
version of threading.local(), that works not only per thread, but even per
coroutine within the same thread?
--
Ivan
On 28 August 2017 at 17:29, Yury Selivanov
On Mon, Aug 28, 2017 at 11:26 AM, Ethan Furman
wrote: On 08/28/2017 04:19 AM, Stefan Krah wrote:
What about this?
async def bar(): setcontext(Context(prec=1)) for i in range(10): await asyncio.sleep(1) yield i
async def foo(): async for i in bar(): # ctx.prec=1? print(Decimal(100) / 3)
If I understand correctly, ctx.prec is whatever the default is, because foo comes before bar on the stack, and after the current value for i is grabbed bar is no longer executing, and therefore no longer on the stack. I hope I'm right. ;)
You're right!
Yury _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ levkivskyi%40gmail.com
On Mon, Aug 28, 2017 at 11:52 AM, Stefan Krah
But the state "leaks in" as per your previous example:
async def bar(): # use decimal with context=ctx
async def foo(): decimal.setcontext(ctx) await bar()
IMHO it shouldn't with coroutine-local-storage (let's call it CLS). So, as I see it, there's still some mixture between dynamic scoping and CLS because it this example bar() is allowed to search the stack.
The whole proposal will then be mostly useless. If we forget about the dynamic scoping (I don't know why it's being brought up all the time, TBH; nobody uses it, almost no language implements it) the current proposal is well balanced and solves multiple problems. Three points listed in the rationale section: * Context managers like decimal contexts, numpy.errstate, and warnings.catch_warnings. * Request-related data, such as security tokens and request data in web applications, language context for gettext etc. * Profiling, tracing, and logging in large code bases. Two of them require context propagation *down* the stack of coroutines. What latest PEP 550 revision does, it prohibits context propagation *up* the stack in coroutines (it's a requirement to make async code refactorable and easy to reason about). Propagation of context "up" the stack in regular code is allowed with threading.local(), and everybody is used to it. Doing that for coroutines doesn't work, because of the reasons covered here: https://www.python.org/dev/peps/pep-0550/#coroutines-and-asynchronous-tasks Yury
On Mon, Aug 28, 2017 at 11:53 AM, Ivan Levkivskyi
A question appeared here about a simple mental model for PEP 550. It looks much clearer now, than in the first version, but I still would like to clarify: can one say that PEP 550 just provides more fine-grained version of threading.local(), that works not only per thread, but even per coroutine within the same thread?
Simple model: 1. Values in the EC propagate down the call stack for both synchronous and asynchronous code. 2. For regular functions/code EC works the same way as threading.local(). Yury
On 08/28/2017 09:12 AM, Yury Selivanov wrote:
If we forget about dynamic scoping (I don't know why it's being brought up all the time, TBH; nobody uses it, almost no language implements it)
Probably because it's not lexical scoping, and possibly because it's possible for a function to be running with one EC on one call, and a different EC on the next -- hence, the EC it's using is dynamically determined. It seems to me the biggest difference between "true" dynamic scoping and what PEP 550 implements is the granularity: i.e. not every single function gets it's own LC, just a select few: generators, async stuff, etc. Am I right? (No CS degree here.) If not, what are the differences? -- ~Ethan~
On Mon, Aug 28, 2017 at 12:43 PM, Ethan Furman
On 08/28/2017 09:12 AM, Yury Selivanov wrote:
If we forget about dynamic scoping (I don't know why it's being brought up all the time, TBH; nobody uses it, almost no language implements it)
Probably because it's not lexical scoping, and possibly because it's possible for a function to be running with one EC on one call, and a different EC on the next -- hence, the EC it's using is dynamically determined.
It seems to me the biggest difference between "true" dynamic scoping and what PEP 550 implements is the granularity: i.e. not every single function gets it's own LC, just a select few: generators, async stuff, etc.
Am I right? (No CS degree here.) If not, what are the differences?
Sounds right to me. If PEP 550 was about adding true dynamic scoping, we couldn't use it as a suitable context management solution for libraries like decimal. For example, converting decimal/numpy to use new APIs would be a totally backwards-incompatible change. I still prefer using a "better TLS" analogy for PEP 550. We'll likely add a section summarizing differences between threading.local() and new APIs (as suggested by Eric Snow). Yury
On Mon, Aug 28, 2017 at 12:12:00PM -0400, Yury Selivanov wrote:
On Mon, Aug 28, 2017 at 11:52 AM, Stefan Krah
wrote: [..] But the state "leaks in" as per your previous example:
async def bar(): # use decimal with context=ctx
async def foo(): decimal.setcontext(ctx) await bar()
IMHO it shouldn't with coroutine-local-storage (let's call it CLS). So, as I see it, there's still some mixture between dynamic scoping and CLS because it this example bar() is allowed to search the stack.
The whole proposal will then be mostly useless. If we forget about the dynamic scoping (I don't know why it's being brought up all the time, TBH; nobody uses it, almost no language implements it)
Because a) it was brought up by proponents of the PEP early on python-ideas, b) people desperately want a mental model of what is going on. :-)
* Context managers like decimal contexts, numpy.errstate, and warnings.catch_warnings.
The decimal context works like this: 1) There is a default context template (user settable). 2) Whenever the first operation *in a new thread* occurs, the thread-local context is initialized with a copy of the template. I don't find it very intuitive if setcontext() is somewhat local in coroutines but they don't participate in some form of CLS. You have to think about things like "what happens in a fresh thread when a coroutine calls setcontext() before any other decimal operation has taken place". So perhaps Nathaniel is right that the PEP is not so useful for numpy and decimal backwards compat. Stefan Krah
On Mon, Aug 28, 2017 at 1:33 PM, Stefan Krah
* Context managers like decimal contexts, numpy.errstate, and warnings.catch_warnings.
The decimal context works like this:
1) There is a default context template (user settable).
2) Whenever the first operation *in a new thread* occurs, the thread-local context is initialized with a copy of the template.
I don't find it very intuitive if setcontext() is somewhat local in coroutines but they don't participate in some form of CLS.
You have to think about things like "what happens in a fresh thread when a coroutine calls setcontext() before any other decimal operation has taken place".
I'm sorry, I don't follow you here. PEP 550 semantics: setcontext() in a regular code would set the context for the whole thread. setcontext() in a coroutine/generator/async generator would set the context for all the code it calls.
So perhaps Nathaniel is right that the PEP is not so useful for numpy and decimal backwards compat.
Nathaniel's argument is pretty weak as I see it. He argues that some people would take the following code: def bar(): # set decimal context def foo(): bar() # use the decimal context set in bar() and blindly convert it to async/await: async def bar(): # set decimal context async def foo(): await bar() # use the decimal context set in bar() And that it's a problem that it will stop working. But almost nobody converts the code by simply slapping async/await on top of it -- things don't work this way. It was never a goal for async/await or asyncio, or even trio/curio. Porting code to async/await almost always requires a thoughtful rewrite. In async/await, the above code is an *anti-pattern*. It's super fragile and can break by adding a timeout around "await bar". There's no workaround here. Asynchronous code is fundamentally non-local and a more complex topic on its own, with its own concepts: Asynchronous Tasks, timeouts, cancellation, etc. Fundamentally: "(synchronous code) != (asynchronous code) - (async/await)". Yury
On Sat, Aug 26, 2017 at 3:09 PM, Nathaniel Smith
You might be interested in these notes I wrote to motivate why we need a chain of namespaces, and why simple "async task locals" aren't sufficient:
https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb
Thanks, Nathaniel! That helped me understand the rationale, though I'm still unconvinced chained lookup is necessary for the stated goal of the PEP. (The rest of my reply is not specific to Nathaniel.) tl;dr Please: * make the chained lookup aspect of the proposal more explicit (and distinct) in the beginning sections of the PEP (or drop chained lookup). * explain why normal frames do not get to take advantage of chained lookup (or allow them to). -------------------- If I understood right, the problem is that we always want context vars resolved relative to the current frame and then to the caller's frame (and on up the call stack). For generators, "caller" means the frame that resumed the generator. Since we don't know what frame will resume the generator beforehand, we can't simply copy the current LC when a generator is created and bind it to the generator's frame. However, I'm still not convinced that's the semantics we need. The key statement is "and then to the caller's frame (and on up the call stack)", i.e. chained lookup. On the linked page Nathaniel explained the position (quite clearly, thank you) using sys.exc_info() as an example of async-local state. I posit that that example isn't particularly representative of what we actually need. Isn't the point of the PEP to provide an async-safe alternative to threading.local()? Any existing code using threading.local() would not expect any kind of chained lookup since threads don't have any. So introducing chained lookup in the PEP is unnecessary and consequently not ideal since it introduces significant complexity. As the PEP is currently written, chained lookup is a key part of the proposal, though it does not explicitly express this. I suppose this is where my confusion has been. At this point I think I understand one rationale for the chained lookup functionality; it takes advantage of the cooperative scheduling characteristics of generators, et al. Unlike with threads, a programmer can know the context under which a generator will be resumed. Thus it may be useful to the programmer to allow (or expect) the resumed generator to fall back to the calling context. However, given the extra complexity involved, is there enough evidence that such capability is sufficiently useful? Could chained lookup be addressed separately (in another PEP)? Also, wouldn't it be equally useful to support chained lookup for function calls? Programmers have the same level of knowledge about the context stack with function calls as with generators. I would expect evidence in favor of chained lookups for generators to also favor the same for normal function calls. -eric
On Sat, Aug 26, 2017 at 10:31 AM, Yury Selivanov
On Sat, Aug 26, 2017 at 9:33 AM, Sven R. Kunze
wrote: [..] Why not the same interface as thread-local storage? This has been the question which bothered me from the beginning of PEP550. I don't understand what inventing a new way of access buys us here.
This was covered at length in these threads:
https://mail.python.org/pipermail/python-ideas/2017-August/046888.html https://mail.python.org/pipermail/python-ideas/2017-August/046889.html
FWIW, it would still be nice to have a simple replacement for the following under PEP 550: class Context(threading.local): ... Transitioning from there to PEP 550 is non-trivial. -eric
Yury Selivanov wrote:
On Mon, Aug 28, 2017 at 1:33 PM, Stefan Krah
wrote: [..] * Context managers like decimal contexts, numpy.errstate, and warnings.catch_warnings.
The decimal context works like this:
1) There is a default context template (user settable).
2) Whenever the first operation *in a new thread* occurs, the thread-local context is initialized with a copy of the template.
I don't find it very intuitive if setcontext() is somewhat local in coroutines but they don't participate in some form of CLS.
You have to think about things like "what happens in a fresh thread when a coroutine calls setcontext() before any other decimal operation has taken place".
I'm sorry, I don't follow you here.
PEP 550 semantics:
setcontext() in a regular code would set the context for the whole thread.
setcontext() in a coroutine/generator/async generator would set the context for all the code it calls.
So perhaps Nathaniel is right that the PEP is not so useful for numpy and decimal backwards compat.
Nathaniel's argument is pretty weak as I see it. He argues that some people would take the following code:
def bar(): # set decimal context
def foo(): bar() # use the decimal context set in bar()
and blindly convert it to async/await:
async def bar(): # set decimal context
async def foo(): await bar() # use the decimal context set in bar()
And that it's a problem that it will stop working.
But almost nobody converts the code by simply slapping async/await on top of it
Maybe not, but it will also affect refactoring of code that is *already* using async/await, e.g. taking async def foobar(): # set decimal context # use the decimal context we just set and refactoring it as above. Given that one of the main motivations for yield-from (and subsequently async/await) was so that you *can* perform that kind of refactoring easily, that does indeed seem like a problem to me. It seems to me that individual generators/coroutines shouldn't automatically get a context of their own, they should have to explicitly ask for one. -- Greg -- things don't work this way. It was never a goal for
async/await or asyncio, or even trio/curio. Porting code to async/await almost always requires a thoughtful rewrite.
In async/await, the above code is an *anti-pattern*. It's super fragile and can break by adding a timeout around "await bar". There's no workaround here.
Asynchronous code is fundamentally non-local and a more complex topic on its own, with its own concepts: Asynchronous Tasks, timeouts, cancellation, etc. Fundamentally: "(synchronous code) != (asynchronous code) - (async/await)".
Yury _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg.ewing%40canterbury.a...
On Mon, Aug 28, 2017 at 6:22 PM, Greg Ewing
But almost nobody converts the code by simply slapping async/await on top of it
Maybe not, but it will also affect refactoring of code that is *already* using async/await, e.g. taking
async def foobar(): # set decimal context # use the decimal context we just set
and refactoring it as above.
There's no code that already uses async/await and decimal context managers/setters. Any such code is broken right now, because decimal context set in one coroutine affects them all. Your example would work only if foobar() is the only coroutine in your program.
Given that one of the main motivations for yield-from (and subsequently async/await) was so that you *can* perform that kind of refactoring easily, that does indeed seem like a problem to me.
With the current PEP 550 semantics w.r.t. generators you still can refactor them. The following code would work as expected: def nested_gen(): # use some_context def gen(): with some_context(): yield from nested_gen() list(gen()) I saying that the following should not work: def nested_gen(): set_some_context() yield def gen(): # some_context is not set yield from nested_gen() # use some_context ??? list(gen()) IOW, any context set in generators should not leak to the caller, ever. This is the whole point of the PEP. As for async/await, see this: https://mail.python.org/pipermail/python-dev/2017-August/149022.html Yury
On Mon, Aug 28, 2017 at 6:19 PM, Eric Snow
On Sat, Aug 26, 2017 at 10:31 AM, Yury Selivanov
wrote: On Sat, Aug 26, 2017 at 9:33 AM, Sven R. Kunze
wrote: [..] Why not the same interface as thread-local storage? This has been the question which bothered me from the beginning of PEP550. I don't understand what inventing a new way of access buys us here.
This was covered at length in these threads:
https://mail.python.org/pipermail/python-ideas/2017-August/046888.html https://mail.python.org/pipermail/python-ideas/2017-August/046889.html
FWIW, it would still be nice to have a simple replacement for the following under PEP 550:
class Context(threading.local): ...
Transitioning from there to PEP 550 is non-trivial.
And it should not be trivial, as the PEP 550 semantics is different from TLS. Using PEP 550 instead of TLS should be carefully evaluated. Please also see this: https://www.python.org/dev/peps/pep-0550/#replication-of-threading-local-int... Yury
Yury Selivanov wrote:
I saying that the following should not work:
def nested_gen(): set_some_context() yield
def gen(): # some_context is not set yield from nested_gen() # use some_context ???
And I'm saying it *should* work, otherwise it breaks one of the fundamental principles on which yield-from is based, namely that 'yield from foo()' should behave as far as possible as a generator equivalent of a plain function call. -- Greg
On Mon, Aug 28, 2017 at 6:56 PM, Greg Ewing
Yury Selivanov wrote:
I saying that the following should not work:
def nested_gen(): set_some_context() yield
def gen(): # some_context is not set yield from nested_gen() # use some_context ???
And I'm saying it *should* work, otherwise it breaks one of the fundamental principles on which yield-from is based, namely that 'yield from foo()' should behave as far as possible as a generator equivalent of a plain function call.
Consider the following generator: def gen(): with decimal.context(...): yield We don't want gen's context to leak to the outer scope -- that's one of the reasons why PEP 550 exists. Even if we do this: g = gen() next(g) # the decimal.context won't leak out of gen So a Python user would have a mental model: context set in generators doesn't leak. Not, let's consider a "broken" generator: def gen(): decimal.context(...) yield If we iterate gen() with next(), it still won't leak its context. But if "yield from" has semantics that you want -- "yield from" to be just like function call -- then calling yield from gen() will corrupt the context of the caller. I simply want consistency. It's easier for everybody to say that generators never leaked their context changes to the outer scope, rather than saying that "generators can sometimes leak their context". Yury
On Mon, Aug 28, 2017 at 3:14 PM, Eric Snow
On Sat, Aug 26, 2017 at 3:09 PM, Nathaniel Smith
wrote: You might be interested in these notes I wrote to motivate why we need a chain of namespaces, and why simple "async task locals" aren't sufficient:
https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb
Thanks, Nathaniel! That helped me understand the rationale, though I'm still unconvinced chained lookup is necessary for the stated goal of the PEP.
(The rest of my reply is not specific to Nathaniel.)
tl;dr Please: * make the chained lookup aspect of the proposal more explicit (and distinct) in the beginning sections of the PEP (or drop chained lookup). * explain why normal frames do not get to take advantage of chained lookup (or allow them to).
--------------------
If I understood right, the problem is that we always want context vars resolved relative to the current frame and then to the caller's frame (and on up the call stack). For generators, "caller" means the frame that resumed the generator. Since we don't know what frame will resume the generator beforehand, we can't simply copy the current LC when a generator is created and bind it to the generator's frame.
However, I'm still not convinced that's the semantics we need. The key statement is "and then to the caller's frame (and on up the call stack)", i.e. chained lookup. On the linked page Nathaniel explained the position (quite clearly, thank you) using sys.exc_info() as an example of async-local state. I posit that that example isn't particularly representative of what we actually need. Isn't the point of the PEP to provide an async-safe alternative to threading.local()?
Any existing code using threading.local() would not expect any kind of chained lookup since threads don't have any. So introducing chained lookup in the PEP is unnecessary and consequently not ideal since it introduces significant complexity.
There's a lot of Python code out there, and it's hard to know what it all wants :-). But I don't think we should get hung up on matching threading.local() -- no-one sits down and says "okay, what my users want is for me to write some code that uses a thread-local", i.e., threading.local() is a mechanism, not an end-goal. My hypothesis is in most cases, when people reach for threading.local(), it's because they have some "contextual" variable, and they want to be able to do things like set it to a value that affects all and only the code that runs inside a 'with' block. So far the only way to approximate this in Python has been to use threading.local(), but chained lookup would work even better. As evidence for this hypothesis: something like chained lookup is important for exc_info() [1] and for Trio's cancellation semantics, and I'm pretty confident that it's what users naturally expect for use cases like 'with decimal.localcontext(): ...' or 'with numpy.errstate(...): ...'. And it works fine for cases like Flask's request-locals that get set once near the top of a callstack and then treated as read-only by most of the code. I'm not aware of any alternative to chained lookup that fulfills all of these use cases -- are you? And I'm not aware of any use cases that require something more than threading.local() but less than chained lookup -- are you? [1] I guess I should say something about including sys.exc_info() as evidence that chained lookup as useful, given that CPython probably won't share code between it's PEP 550 implementation and its sys.exc_info() implementation. I'm mostly citing it as a evidence that this is a real kind of need that can arise when writing programs -- if it happens once, it'll probably happen again. But I can also imagine that other implementations might want to share code here, and it's certainly nice if the Python-the-language spec can just say "exc_info() has semantics 'as if' it were implemented using PEP 550 storage" and leave it at that. Plus it's kind of rude for the interpreter to claim semantics for itself that it won't let anyone else implement :-).
As the PEP is currently written, chained lookup is a key part of the proposal, though it does not explicitly express this. I suppose this is where my confusion has been.
At this point I think I understand one rationale for the chained lookup functionality; it takes advantage of the cooperative scheduling characteristics of generators, et al. Unlike with threads, a programmer can know the context under which a generator will be resumed. Thus it may be useful to the programmer to allow (or expect) the resumed generator to fall back to the calling context. However, given the extra complexity involved, is there enough evidence that such capability is sufficiently useful? Could chained lookup be addressed separately (in another PEP)?
Also, wouldn't it be equally useful to support chained lookup for function calls? Programmers have the same level of knowledge about the context stack with function calls as with generators. I would expect evidence in favor of chained lookups for generators to also favor the same for normal function calls.
The important difference between generators/coroutines and normal function calls is that with normal function calls, the link between the caller and callee is fixed for the entire lifetime of the inner frame, so there's no way for the context to shift under your feet. If all we had were normal function calls, then (green-) thread locals using the save/restore trick would be enough to handle all the use cases above -- it's only for generators/coroutines where the save/restore trick breaks down. This means that pushing/popping LCs when crossing into/out of a generator frame is the minimum needed to get the desired semantics, and it keeps the LC stack small (important since lookups can be O(n) in the worst case), and it minimizes the backcompat breakage for operations like decimal.setcontext() where people *do* expect to call it in a subroutine and have the effects be visible in the caller. -n -- Nathaniel J. Smith -- https://vorpus.org
On Mon, Aug 28, 2017 at 6:07 PM, Nathaniel Smith
The important difference between generators/coroutines and normal function calls is that with normal function calls, the link between the caller and callee is fixed for the entire lifetime of the inner frame, so there's no way for the context to shift under your feet. If all we had were normal function calls, then (green-) thread locals using the save/restore trick would be enough to handle all the use cases above -- it's only for generators/coroutines where the save/restore trick breaks down. This means that pushing/popping LCs when crossing into/out of a generator frame is the minimum needed to get the desired semantics, and it keeps the LC stack small (important since lookups can be O(n) in the worst case), and it minimizes the backcompat breakage for operations like decimal.setcontext() where people *do* expect to call it in a subroutine and have the effects be visible in the caller.
I like this way of looking at things. Does this have any bearing on asyncio.Task? To me those look more like threads than like generators. Or possibly they should inherit the lookup chain from the point when the Task was created, but not be affected at all by the lookup chain in place when they are executed. FWIW we *could* have a policy that OS threads also inherit the lookup chain from their creator, but I doubt that's going to fly with backwards compatibility. I guess my general (hurried, sorry) view is that we're at a good point where we have a small number of mechanisms but are still debating policies on how those mechanisms should be used. (The basic mechanism is chained lookup and the policies are about how the chains are fit together for various language/library constructs.) -- --Guido van Rossum (python.org/~guido)
On 8/28/2017 6:50 PM, Guido van Rossum wrote:
FWIW we *could* have a policy that OS threads also inherit the lookup chain from their creator, but I doubt that's going to fly with backwards compatibility.
Since LC is new, how could such a policy affect backwards compatibility? The obvious answer would be that some use cases that presently use other mechanisms that "should" be ported to using LC would have to be careful in how they do the port, but discussion seems to indicate that they would have to be careful in how they do the port anyway. One of the most common examples is the decimal context. IIUC, each thread gets its initial decimal context from a global template, rather than inheriting from its parent thread. Porting decimal context to LC then, in the event of OS threads inheriting the lookup chain from their creator, would take extra work for compatibility: setting the decimal context from the global template (a step it must already take) rather than accepting the inheritance. It might be appropriate that an updated version of decimal that uses LC would offer the option of inheriting the decimal context from the parent thread, or using the global template, as an enhancement.
On Mon, Aug 28, 2017 at 9:50 PM, Guido van Rossum
On Mon, Aug 28, 2017 at 6:07 PM, Nathaniel Smith
wrote: The important difference between generators/coroutines and normal function calls is that with normal function calls, the link between the caller and callee is fixed for the entire lifetime of the inner frame, so there's no way for the context to shift under your feet. If all we had were normal function calls, then (green-) thread locals using the save/restore trick would be enough to handle all the use cases above -- it's only for generators/coroutines where the save/restore trick breaks down. This means that pushing/popping LCs when crossing into/out of a generator frame is the minimum needed to get the desired semantics, and it keeps the LC stack small (important since lookups can be O(n) in the worst case), and it minimizes the backcompat breakage for operations like decimal.setcontext() where people *do* expect to call it in a subroutine and have the effects be visible in the caller.
I like this way of looking at things. Does this have any bearing on asyncio.Task? To me those look more like threads than like generators. Or possibly they should inherit the lookup chain from the point when the Task was created, [..]
We explain why tasks have to inherit the lookup chain from the point where they are created in the PEP (in the new High-level Specification section): https://www.python.org/dev/peps/pep-0550/#coroutines-and-asynchronous-tasks In short, without inheriting the chain we can't wrap coroutines into tasks (like wrapping an await in wait_for() would break the code, if we don't inherit the chain). In the latest version (v4) we made all coroutines to have their own Logical Context, which, as we discovered today, makes us unable to set context variables in __aenter__ coroutines. This will be fixed in the next version.
FWIW we *could* have a policy that OS threads also inherit the lookup chain from their creator, but I doubt that's going to fly with backwards compatibility.
Backwards compatibility is indeed an issue. Inheriting the chain for threads would mean another difference between PEP 550 and 'threading.local()', that could cause backwards incompatible behaviour for decimal/numpy when they are updated to new APIs. For decimal, for example, we could use the following pattern to fallback to use the default decimal context for ECs (threads) that don't have it set: ctx = decimal_var.get(default=default_decimal_ctx) We can also add an 'initializer' keyword-argument to 'new_context_var' to specify a callable that will be used to give a default value to the var. Another issue, is that with the current C API, we can only inherit EC for threads started with 'threading.Thread'. There's no reliable way to inherit the chain if a thread was initialized by a C extension. IMO, inheriting the lookup chain in threads makes sense when we use them for pools, like concurrent.futures.ThreadPoolExecutor. When threads are used as long-running subprograms, inheriting the chain should be an opt-in. Yury
On 27 August 2017 at 03:23, Yury Selivanov
On Sat, Aug 26, 2017 at 1:23 PM, Ethan Furman
wrote: On 08/26/2017 09:25 AM, Yury Selivanov wrote:
ContextVar.lookup() method *traverses the stack* until it finds the LC that has a value. "get()" does not reflect this subtle semantics difference.
A good point; however, ChainMap, which behaves similarly as far as lookup goes, uses "get" and does not have a "lookup" method. I think we lose more than we gain by changing that method name.
ChainMap is constrained to be a Mapping-like object, but I get your point. Let's see what others say about the "lookup()". It is kind of an experiment to try a name and see if it fits.
I don't think "we may want to add extra parameters" is a good reason to omit a conventional `get()` method - I think it's a reason to offer a separate API to handle use cases where the question of *where* the var is set matters (for example, `my_var.is_set()` would indicate whether or not `my_var.set()` has been called in the current logical context without requiring a parameter check for normal lookups that don't care). Cheers, Nick. P.S. And I say that as a reader who correctly guessed why you had changed the method name in the current iteration of the proposal. I'm sympathetic to those reasons, but I think sticking with the conventional API will make this one easier to learn and use :) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Aug 29, 2017 at 5:01 AM, Nick Coghlan
P.S. And I say that as a reader who correctly guessed why you had changed the method name in the current iteration of the proposal. I'm sympathetic to those reasons, but I think sticking with the conventional API will make this one easier to learn and use :)
Yeah, I agree. We'll switch lookup -> get in the next iteration. Guido's parallel with getattr/setattr/delattr is also useful. getattr can also lookup the attribute in base classes, but we still call it "get". Yury
On 29 August 2017 at 23:18, Yury Selivanov
On Tue, Aug 29, 2017 at 5:01 AM, Nick Coghlan
wrote: [..] P.S. And I say that as a reader who correctly guessed why you had changed the method name in the current iteration of the proposal. I'm sympathetic to those reasons, but I think sticking with the conventional API will make this one easier to learn and use :)
Yeah, I agree. We'll switch lookup -> get in the next iteration.
Guido's parallel with getattr/setattr/delattr is also useful. getattr can also lookup the attribute in base classes, but we still call it "get".
True, in many ways attribute inheritance is Python's original ChainMap implementation :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Aug 28, 2017 at 7:16 PM, Yury Selivanov
On Mon, Aug 28, 2017 at 6:56 PM, Greg Ewing
wrote: Yury Selivanov wrote:
I saying that the following should not work:
def nested_gen(): set_some_context() yield
def gen(): # some_context is not set yield from nested_gen() # use some_context ???
And I'm saying it *should* work, otherwise it breaks one of the fundamental principles on which yield-from is based, namely that 'yield from foo()' should behave as far as possible as a generator equivalent of a plain function call.
Consider the following generator:
def gen(): with decimal.context(...): yield
We don't want gen's context to leak to the outer scope -- that's one of the reasons why PEP 550 exists. Even if we do this:
g = gen() next(g) # the decimal.context won't leak out of gen
So a Python user would have a mental model: context set in generators doesn't leak.
Not, let's consider a "broken" generator:
def gen(): decimal.context(...) yield
If we iterate gen() with next(), it still won't leak its context. But if "yield from" has semantics that you want -- "yield from" to be just like function call -- then calling
yield from gen()
will corrupt the context of the caller.
I simply want consistency. It's easier for everybody to say that generators never leaked their context changes to the outer scope, rather than saying that "generators can sometimes leak their context".
Adding to the above: there's a fundamental reason why we can't make "yield from" transparent for EC modifications. While we want "yield from" to have semantics close to a function call, in some situations we simply can't. Because you can manually iterate a generator and then 'yield from' it, you can have this weird 'partial-function-call' semantics. For example: var = new_context_var() def gen(): var.set(42) yield yield Now, we can partially iterate the generator (1): def main(): g = gen() next(g) # we don't want 'g' to leak its EC changes, # so var.get() is None here. assert var.get() is None and then we can "yield from" it (2): def main(): g = gen() next(g) # we don't want 'g' to leak its EC changes, # so var.get() is None here. assert var.get() is None yield from g # at this point it's too late for us to let var leak into # main().__logical_context__ For (1) we want the context change to be isolated. For (2) you say that the context change should propagate to the caller. But it's impossible: 'g' already has its own LC({var: 42}), and we can't merge it with the LC of "main()". "await" is fundamentally different, because it's not possible to partially iterate the coroutine before awaiting it (asyncio will break if you call "coro.send(None)" manually). Yury
Yury Selivanov wrote:
Consider the following generator:
def gen(): with decimal.context(...): yield
We don't want gen's context to leak to the outer scope
That's understandable, but fixing that problem shouldn't come at the expense of breaking the ability to refactor generator code or async code without changing its semantics. I'm not convinced that it has to, either. In this example, the with-statement is the thing that should be establishing a new nested context. Yielding and re-entering the generator should only be swapping around between existing contexts.
Not, let's consider a "broken" generator:
def gen(): decimal.context(...) yield
The following non-generator code is "broken" in exactly the same way: def foo(): decimal.context(...) do_some_decimal_calculations() # Context has now been changed for the caller
I simply want consistency.
So do I! We just have different ideas about what consistency means here.
It's easier for everybody to say that generators never leaked their context changes to the outer scope, rather than saying that "generators can sometimes leak their context".
No, generators should *always* leak their context changes to exactly the same extent that normal functions do. If you don't want to leak a context change, you should use a with statement. What you seem to be suggesting is that generators shouldn't leak context changes even when you *don't* use a with-statement. If you're going to to that, you'd better make sure that the same thing applies to regular functions, otherwise you've introduced an inconsistency. -- Greg
On Tue, Aug 29, 2017 at 5:45 PM, Greg Ewing
What you seem to be suggesting is that generators shouldn't leak context changes even when you *don't* use a with-statement.
Yes, generators shouldn't leak context changes regardless of what and how changes the context inside them: var = new_context_var() def gen(): old_val = var.get() try: var.set('blah') yield yield yield finally: var.set(old_val) with the above code, when you do "next(gen())" it would leak the state without PEP 550. "finally" block (or "with" block wouldn't help you here) and corrupt the state of the caller. That's the problem the PEP fixes. The EC interaction with generators is explained here with a great detail: https://www.python.org/dev/peps/pep-0550/#id4 We explain the motivation behind desiring a working context-local solution for generators in the Rationale section: https://www.python.org/dev/peps/pep-0550/#rationale Basically half of the PEP is about isolating context in generators.
If you're going to to that, you'd better make sure that the same thing applies to regular functions, otherwise you've introduced an inconsistency.
Regular functions cannot pause/resume their execution, so they can't leak an inconsistent context change due to out of order or partial execution. PEP 550 positions itself as a replacement for TLS, and clearly defines its semantics for regular functions in a single thread, regular functions in multithreaded code, generators, and asynchronous code (async/await). Everything is specified in the High-level Specification section. I wouldn't call slightly differently defined semantics for generators/coroutines/functions an "inconsistency" -- they just have a different EC semantics given how different they are from each other. Drawing a parallel between 'yield from' and function calls is possible, but we shouldn't forget that you can 'yield from' a half-iterated generator. Yury
On Tue, Aug 29, 2017 at 06:01:40PM -0400, Yury Selivanov wrote:
PEP 550 positions itself as a replacement for TLS, and clearly defines its semantics for regular functions in a single thread, regular functions in multithreaded code, generators, and asynchronous code (async/await). Everything is specified in the High-level Specification section. I wouldn't call slightly differently defined semantics for generators/coroutines/functions an "inconsistency" -- they just have a different EC semantics given how different they are from each other.
What I don't find so consistent is that the async universe is guarded with async {def, for, with, ...}, but in this proposal regular context managers and context setters implicitly adapt their behavior. So, pedantically, having a language extension like async set(var, value) x = async get(var) and making async-safe context managers explicit async with decimal.localcontext(): ... would feel more consistent. I know generators are a problem, but even allowing something like "async set" in generators would be a step up. Stefan Krah
On Tue, Aug 29, 2017 at 7:06 PM, Stefan Krah
On Tue, Aug 29, 2017 at 06:01:40PM -0400, Yury Selivanov wrote:
PEP 550 positions itself as a replacement for TLS, and clearly defines its semantics for regular functions in a single thread, regular functions in multithreaded code, generators, and asynchronous code (async/await). Everything is specified in the High-level Specification section. I wouldn't call slightly differently defined semantics for generators/coroutines/functions an "inconsistency" -- they just have a different EC semantics given how different they are from each other.
What I don't find so consistent is that the async universe is guarded with async {def, for, with, ...}, but in this proposal regular context managers and context setters implicitly adapt their behavior.
So, pedantically, having a language extension like
async set(var, value) x = async get(var)
and making async-safe context managers explicit
async with decimal.localcontext(): ...
would feel more consistent. I know generators are a problem, but even allowing something like "async set" in generators would be a step up.
But regular context managers work just fine with asynchronous code. Not all of them have some local state. For example, you could have a context manager to time how long the code wrapped into it executes: async def foo(): with timing(): await ... We use asynchronous context managers only when they need to do asynchronous operations in their __aenter__ and __aexit__ (like DB transaction begin/rollback/commit). Requiring "await" to set a value for context variable would force us to write specialized async CMs for cases where a sync CM would do just fine. This in turn, would make it impossible to use some sync libraries in async code. But there's nothing wrong in using numpy/numpy.errstate in a coroutine. I want to be able to copy/paste their examples into my async code and I'd expect it to just work -- that's the point of the PEP. async/await already requires to have separate APIs in libraries that involve IO. Let's not make the situation worse by asking people to use asynchronous version of PEP 550 even though it's not really needed. Yury
Yury Selivanov wrote:
While we want "yield from" to have semantics close to a function call,
That's not what I said! I said that "yield from foo()" should have semantics close to a function call. If you separate the "yield from" from the "foo()", then of course you can get different behaviours. But that's beside the point, because I'm not suggesting that generators should behave differently depending on when or if you use "yield from" on them.
For (1) we want the context change to be isolated. For (2) you say that the context change should propagate to the caller.
No, I'm saying that the context change should *always* propagate to the caller, unless you do something explicit within the generator to prevent it. I have some ideas on what that something might be, which I'll post later. -- Greg
On Tue, Aug 29, 2017 at 7:36 PM, Greg Ewing
Yury Selivanov wrote:
While we want "yield from" to have semantics close to a function call,
That's not what I said! I said that "yield from foo()" should have semantics close to a function call. If you separate the "yield from" from the "foo()", then of course you can get different behaviours.
But that's beside the point, because I'm not suggesting that generators should behave differently depending on when or if you use "yield from" on them.
OK, that wasn't clear. Yury
On Tue, Aug 29, 2017 at 7:36 PM, Greg Ewing
For (1) we want the context change to be isolated. For (2) you say that the context change should propagate to the caller.
No, I'm saying that the context change should *always* propagate to the caller, unless you do something explicit within the generator to prevent it.
I have some ideas on what that something might be, which I'll post later.
BTW we already have mechanisms to always propagate context to the caller -- just use threading.local() or a global variable. PEP 550 is for situations when you explicitly don't want to propagate the state. Anyways, I'm curious to hear your ideas. Yury
On 30 August 2017 at 10:18, Yury Selivanov
On Tue, Aug 29, 2017 at 7:36 PM, Greg Ewing
wrote: [..] For (1) we want the context change to be isolated. For (2) you say that the context change should propagate to the caller.
No, I'm saying that the context change should *always* propagate to the caller, unless you do something explicit within the generator to prevent it.
I have some ideas on what that something might be, which I'll post later.
BTW we already have mechanisms to always propagate context to the caller -- just use threading.local() or a global variable. PEP 550 is for situations when you explicitly don't want to propagate the state.
Writing an "update_parent_context" decorator is also trivial (and will work for both sync and async generators): def update_parent_context(gf): @functools.wraps(gf): def wrapper(*args, **kwds): gen = gf(*args, **kwds): gen.__logical_context__ = None return gen return wrapper The PEP already covers that approach when it talks about the changes to contextlib.contextmanager to get context changes to propagate automatically. With contextvars getting its own module, it would also be straightforward to simply include that decorator as part of its API, so folks won't need to write their own. While I'm not sure how much practical use it will see, I do think it's important to preserve the *ability* to transparently refactor generators using yield from - I'm just OK with such a refactoring becoming "yield from update_parent_context(subgen())" instead of the current "yield from subgen()" (as I think *not* updating the parent context is a better default than updating it). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 30 August 2017 at 16:40, Nick Coghlan
Writing an "update_parent_context" decorator is also trivial (and will work for both sync and async generators):
def update_parent_context(gf): @functools.wraps(gf): def wrapper(*args, **kwds): gen = gf(*args, **kwds): gen.__logical_context__ = None return gen return wrapper [snip] While I'm not sure how much practical use it will see, I do think it's important to preserve the *ability* to transparently refactor generators using yield from - I'm just OK with such a refactoring becoming "yield from update_parent_context(subgen())" instead of the current "yield from subgen()" (as I think *not* updating the parent context is a better default than updating it).
Oops, I got mixed up between whether I thought this should be a decorator or an explicitly called helper function. One option would be to provide both: def update_parent_context(gen): ""Configures a generator-iterator to update its caller's context variables"""" gen.__logical_context__ = None return gen def updates_parent_context(gf): ""Wraps a generator function's instances with update_parent_context"""" @functools.wraps(gf): def wrapper(*args, **kwds): return update_parent_context(gf(*args, **kwds)) return wrapper Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Wed, Aug 30, 2017 at 2:36 AM, Greg Ewing
Yury Selivanov wrote:
While we want "yield from" to have semantics close to a function call,
That's not what I said! I said that "yield from foo()" should have semantics close to a function call. If you separate the "yield from" from the "foo()", then of course you can get different behaviours.
But that's beside the point, because I'm not suggesting that generators should behave differently depending on when or if you use "yield from" on them.
For (1) we want the context change to be isolated. For (2) you say
that the context change should propagate to the caller.
No, I'm saying that the context change should *always* propagate to the caller, unless you do something explicit within the generator to prevent it.
I have some ideas on what that something might be, which I'll post later.
FYI, I've been sketching an alternative solution that addresses these kinds of things. I've been hesitant to post about it, partly because of the PEP550-based workarounds that Nick, Nathaniel, Yury etc. have been describing, and partly because that might be a major distraction from other useful discussions, especially because I wasn't completely sure yet about whether my approach has some fatal flaw compared to PEP 550 ;). —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
Yury Selivanov wrote:
BTW we already have mechanisms to always propagate context to the caller -- just use threading.local() or a global variable.
But then you don't have a way to *not* propagate the context change when you don't want to. Here's my suggestion: Make an explicit distinction between creating a new binding for a context var and updating an existing one. So instead of two API calls there would be three: contextvar.new(value) # Creates a new binding only # visible to this frame and # its callees contextvar.set(value) # Updates existing binding in # context inherited from caller contextvar.get() # Retrieves the current binding If we assume an extension to the decimal module so that decimal.localcontext is a context var, we can now do this: async def foo(): # Establish a new context for this task decimal.localcontext.new(decimal.Context()) # Delegate changing the context await bar() # Do some calculations yield 17 * math.pi + 42 async def bar(): # Change context for caller decimal.localcontext.prec = 5 -- Greg
On Wed, Aug 30, 2017 at 8:19 AM, Koos Zevenhoven
On Wed, Aug 30, 2017 at 2:36 AM, Greg Ewing
wrote: Yury Selivanov wrote:
While we want "yield from" to have semantics close to a function call,
That's not what I said! I said that "yield from foo()" should have semantics close to a function call. If you separate the "yield from" from the "foo()", then of course you can get different behaviours.
But that's beside the point, because I'm not suggesting that generators should behave differently depending on when or if you use "yield from" on them.
For (1) we want the context change to be isolated. For (2) you say that the context change should propagate to the caller.
No, I'm saying that the context change should *always* propagate to the caller, unless you do something explicit within the generator to prevent it.
I have some ideas on what that something might be, which I'll post later.
FYI, I've been sketching an alternative solution that addresses these kinds of things. I've been hesitant to post about it, partly because of the PEP550-based workarounds that Nick, Nathaniel, Yury etc. have been describing, and partly because that might be a major distraction from other useful discussions, especially because I wasn't completely sure yet about whether my approach has some fatal flaw compared to PEP 550 ;).
We'll never know until you post it. Go ahead. Yury
On Wed, Aug 30, 2017 at 8:55 AM, Greg Ewing
Yury Selivanov wrote:
BTW we already have mechanisms to always propagate context to the caller -- just use threading.local() or a global variable.
But then you don't have a way to *not* propagate the context change when you don't want to.
Here's my suggestion: Make an explicit distinction between creating a new binding for a context var and updating an existing one.
So instead of two API calls there would be three:
contextvar.new(value) # Creates a new binding only # visible to this frame and # its callees
contextvar.set(value) # Updates existing binding in # context inherited from caller
contextvar.get() # Retrieves the current binding
If we assume an extension to the decimal module so that decimal.localcontext is a context var, we can now do this:
async def foo(): # Establish a new context for this task decimal.localcontext.new(decimal.Context()) # Delegate changing the context await bar() # Do some calculations yield 17 * math.pi + 42
async def bar(): # Change context for caller decimal.localcontext.prec = 5
Interesting. Question: how to write a context manager with contextvar.new? var = new_context_var() class CM: def __enter__(self): var.new(42) with CM(): print(var.get() or 'None') My understanding that the above code will print "None", because "var.new()" makes 42 visible only to callees of __enter__. But if I use "set()" in "CM.__enter__", presumably, it will traverse the stack of LCs to the very bottom and set "var=42" in in it. Right? If so, how can fix the example in PEP 550 Rationale: https://www.python.org/dev/peps/pep-0550/#rationale where we zip() the "fractions()" generator? With current PEP 550 semantics that's trivial: https://www.python.org/dev/peps/pep-0550/#generators Yury
On Wed, Aug 30, 2017 at 9:44 AM, Yury Selivanov
FYI, I've been sketching an alternative solution that addresses these kinds of things. I've been hesitant to post about it, partly because of the PEP550-based workarounds that Nick, Nathaniel, Yury etc. have been describing, and partly because that might be a major distraction from other useful discussions, especially because I wasn't completely sure yet about whether my approach has some fatal flaw compared to PEP 550 ;).
We'll never know until you post it. Go ahead.
The only alternative design that I considered for PEP 550 and ultimately rejected was to have a the following thread-specific mapping: { var1: [stack of values for var1], var2: [stack of values for var2] } So the idea is that when we set a value for the variable in some frame, we push it to its stack. When the frame is done, we pop it. This is a classic approach (called Shallow Binding) to implement dynamic scope. The fatal flow that made me to reject this approach was the CM protocol (__enter__). Specifically, context managers need to be able to control values in outer frames, and this is where this approach becomes super messy. Yury
Can Execution Context be implemented outside of CPython
I know I'm well late to the game and a bit dense, but where in the pep is
the justification for this assertion? I ask because we buy something to
solve the same problem in Twisted some time ago:
https://bitbucket.org/hipchat/txlocal . We were able to leverage
generator/coroutine decorators to preserve state without modifying the
runtime.
Given that this problem only exists in runtime that multiplex coroutines on
a single thread and the fact that coroutine execution engines only exist in
user space, why doesn't it make more sense to leave this to a library that
engines like asyncio and Twisted are responsible for standardising on?
On Wed, Aug 30, 2017, 09:40 Yury Selivanov
On Wed, Aug 30, 2017 at 9:44 AM, Yury Selivanov
wrote: [..] FYI, I've been sketching an alternative solution that addresses these kinds of things. I've been hesitant to post about it, partly because of the PEP550-based workarounds that Nick, Nathaniel, Yury etc. have been describing, and partly because that might be a major distraction from other useful discussions, especially because I wasn't completely sure yet about whether my approach has some fatal flaw compared to PEP 550 ;).
We'll never know until you post it. Go ahead.
The only alternative design that I considered for PEP 550 and ultimately rejected was to have a the following thread-specific mapping:
{ var1: [stack of values for var1], var2: [stack of values for var2] }
So the idea is that when we set a value for the variable in some frame, we push it to its stack. When the frame is done, we pop it. This is a classic approach (called Shallow Binding) to implement dynamic scope. The fatal flow that made me to reject this approach was the CM protocol (__enter__). Specifically, context managers need to be able to control values in outer frames, and this is where this approach becomes super messy.
Yury _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/kevinjacobconway%40gmail....
On Wed, Aug 30, 2017 at 1:39 PM, Kevin Conway
Can Execution Context be implemented outside of CPython
I know I'm well late to the game and a bit dense, but where in the pep is the justification for this assertion? I ask because we buy something to solve the same problem in Twisted some time ago: https://bitbucket.org/hipchat/txlocal . We were able to leverage generator/coroutine decorators to preserve state without modifying the runtime.
Given that this problem only exists in runtime that multiplex coroutines on a single thread and the fact that coroutine execution engines only exist in user space, why doesn't it make more sense to leave this to a library that
To work with coroutines we have asyncio/twisted or other frameworks. They create async tasks and manage them. Generators, OTOH, don't have a framework that runs them, they are managed by the Python interpreter. So its not possible to implement a *complete context solution* that equally supports generators and coroutines outside of the interpreter. Another problem, is that every framework has its own local context solution. Twisted has one, gevent has another. But libraries like numpy and decimal can't use them to store their local context data, because they are non-standard. That's why we need to solve this problem once in Python directly. Yury
On Wed, Aug 30, 2017 at 5:36 PM, Yury Selivanov
On Wed, Aug 30, 2017 at 9:44 AM, Yury Selivanov
wrote: [..] FYI, I've been sketching an alternative solution that addresses these kinds of things. I've been hesitant to post about it, partly because of the PEP550-based workarounds that Nick, Nathaniel, Yury etc. have been describing, and partly because that might be a major distraction from other useful discussions, especially because I wasn't completely sure yet about whether my approach has some fatal flaw compared to PEP 550 ;).
We'll never know until you post it. Go ahead.
Anyway, thanks to these efforts, your proposal has become somewhat more competitive compared to mine ;). I'll post mine as soon as I find the time to write everything down. My intention is before next week. —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
Yury Selivanov wrote:
Question: how to write a context manager with contextvar.new?
var = new_context_var()
class CM:
def __enter__(self): var.new(42)
with CM(): print(var.get() or 'None')
My understanding that the above code will print "None", because "var.new()" makes 42 visible only to callees of __enter__.
If you tie the introduction of a new scope for context vars to generators, as PEP 550 currently does, then this isn't a problem. But I'm trying to avoid doing that. The basic issue is that, ever since yield-from, "generator" and "task" are not synonymous. When you use a generator to implement an iterator, you probably want it to behave as a distinct task with its own local context. But a generator used with yield-from isn't a task of its own, it's just part of another task, and there is nothing built into Python that lets you tell the difference automatically. So I'm now thinking that the introduction of a new local context should also be explicit. Suppose we have these primitives: push_local_context() pop_local_context() Now introducing a temporary decimal context looks like: push_local_context() decimal.localcontextvar.new(decimal.getcontext().copy()) decimal.localcontextvar.prec = 5 do_some_calculations() pop_local_context() Since calls (either normal or generator) no longer automatically result in a new local context, we can easily factor this out into a context manager: class LocalDecimalContext(): def __enter__(self): push_local_context() ctx = decimal.getcontext().copy() decimal.localcontextvar.new(ctx) return ctx def __exit__(self): pop_local_context() Usage: with LocalDecimalContext() as ctx: ctx.prec = 5 do_some_calculations() -- Greg
But if I use "set()" in "CM.__enter__", presumably, it will traverse the stack of LCs to the very bottom and set "var=42" in in it. Right?
If so, how can fix the example in PEP 550 Rationale: https://www.python.org/dev/peps/pep-0550/#rationale where we zip() the "fractions()" generator?
With current PEP 550 semantics that's trivial: https://www.python.org/dev/peps/pep-0550/#generators
Yury
On Tue, Sep 5, 2017 at 4:59 PM, Greg Ewing
Yury Selivanov wrote:
Question: how to write a context manager with contextvar.new?
var = new_context_var()
class CM:
def __enter__(self): var.new(42)
with CM(): print(var.get() or 'None')
My understanding that the above code will print "None", because "var.new()" makes 42 visible only to callees of __enter__.
If you tie the introduction of a new scope for context vars to generators, as PEP 550 currently does, then this isn't a problem.
But I'm trying to avoid doing that. The basic issue is that, ever since yield-from, "generator" and "task" are not synonymous.
When you use a generator to implement an iterator, you probably want it to behave as a distinct task with its own local context. But a generator used with yield-from isn't a task of its own, it's just part of another task, and there is nothing built into Python that lets you tell the difference automatically.
Greg, have you seen this new section: https://www.python.org/dev/peps/pep-0550/#should-yield-from-leak-context-cha... ? It has a couple of examples that illustrate some issues with the "But a generator used with yield-from isn't a task of its own, it's just part of another task," reasoning. In principle, we can modify PEP 550 to make 'yield from' transparent to context changes. The interpreter can just reset g.__logical_context__ to None whenever 'g' is being 'yield-frommed'. The key issue is that there are a couple of edge-cases when having this semantics is problematic. The bottomline is that it's easier to reason about context when it's guaranteed that context changes are always isolated in generators no matter what. I think this semantics actually makes the refactoring easier. Please take a look at the linked section.
So I'm now thinking that the introduction of a new local context should also be explicit.
Suppose we have these primitives:
push_local_context() pop_local_context()
Now introducing a temporary decimal context looks like:
push_local_context() decimal.localcontextvar.new(decimal.getcontext().copy()) decimal.localcontextvar.prec = 5 do_some_calculations() pop_local_context()
Since calls (either normal or generator) no longer automatically result in a new local context, we can easily factor this out into a context manager:
class LocalDecimalContext():
def __enter__(self): push_local_context() ctx = decimal.getcontext().copy() decimal.localcontextvar.new(ctx) return ctx
def __exit__(self): pop_local_context()
Usage:
with LocalDecimalContext() as ctx: ctx.prec = 5 do_some_calculations()
This will have some performance implications and make the API way more complex. But I'm not convinced yet that real-life code needs the semantics you want. This will work with the current PEP 550 design: def g(): with DecimalContext() as ctx: ctx.prec = 5 yield from do_some_calculations() # will run with the correct ctx the only thing that won't work is this: def do_some_calculations(): ctx = DecimalContext() ctx.prec = 10 decimal.setcontext(ctx) yield def g(): yield from do_some_calculations() # Context changes in do_some_calculations() will not leak to g() In the above example, do_some_calculations() deliberately tries to leak context changes (by not using a contextmanager). And I consider it a feature that PEP 550 does not allow generators to leak state. If you write code that uses 'with' statements consistently, you will never even know that context changes are isolated in generators. Yury
Yury Selivanov wrote:
Greg, have you seen this new section: https://www.python.org/dev/peps/pep-0550/#should-yield-from-leak-context-cha...
That section seems to be addressing the idea of a generator behaving differently depending on whether you use yield-from on it. I never suggested that, and I'm still not suggesting it.
The bottomline is that it's easier to reason about context when it's guaranteed that context changes are always isolated in generators no matter what.
I don't see a lot of value in trying to automagically isolate changes to global state *only* in generators. Under PEP 550, if you want to e.g. change the decimal context temporarily in a non-generator function, you're still going to have to protect those changes using a with-statement or something equivalent. I don't see why the same thing shouldn't apply to generators. It seems to me that it will be *more* confusing to give generators this magical ability to avoid with-statements.
This will have some performance implications and make the API way more complex.
I can't see how it would have any significant effect on performance. The implementation would be very similar to what's currently described in the PEP. You'll have to elaborate on how you think it would be less efficient. As for complexity, push_local_context() and push_local_context() would be considered low-level primitives that you wouldn't often use directly. Most of the time they would be hidden inside context managers. You could even have a context manager just for applying them: with new_local_context(): # go nuts with context vars here
But I'm not convinced yet that real-life code needs the semantics you want.
And I'm not convinced that it needs as much magic as you want.
If you write code that uses 'with' statements consistently, you will never even know that context changes are isolated in generators.
But if you write code that uses context managers consistently, and those context managers know about and handle local contexts properly, generators don't *need* to isolate their context automatically. -- Greg
Another comment from bystander point of view: it looks like the discussions of API design and implementation are a bit entangled here. This is much better in the current version of the PEP, but still there is a _feelling_ that some design decisions are influenced by the implementation strategy. As I currently see the "philosophy" at large is like this: there are different level of coupling between concurrently executing code: * processes: practically not coupled, designed to be long running * threads: more tightly coupled, designed to be less long-lived, context is managed by threading.local, which is not inherited on "forking" * tasks: tightly coupled, designed to be short-lived, context will be managed by PEP 550, context is inherited on "forking" This seems right to me. Normal generators fall out from this "scheme", and it looks like their behavior is determined by the fact that coroutines are implemented as generators. What I think miht help is to add few more motivational examples to the design section of the PEP. -- Ivan
On Wed, Sep 6, 2017 at 1:49 AM, Ivan Levkivskyi
Another comment from bystander point of view: it looks like the discussions of API design and implementation are a bit entangled here. This is much better in the current version of the PEP, but still there is a _feelling_ that some design decisions are influenced by the implementation strategy.
As I currently see the "philosophy" at large is like this: there are different level of coupling between concurrently executing code: * processes: practically not coupled, designed to be long running * threads: more tightly coupled, designed to be less long-lived, context is managed by threading.local, which is not inherited on "forking" * tasks: tightly coupled, designed to be short-lived, context will be managed by PEP 550, context is inherited on "forking"
This seems right to me.
Normal generators fall out from this "scheme", and it looks like their behavior is determined by the fact that coroutines are implemented as generators. What I think miht help is to add few more motivational examples to the design section of the PEP.
Literally the first motivating example at the beginning of the PEP ('def fractions ...') involves only generators, not coroutines, and only works correctly if generators get special handling. (In fact, I'd be curious to see how Greg's {push,pop}_local_storage could handle this case.) The implementation strategy changed radically between v1 and v2 because of considerations around generator (not coroutine) semantics. I'm not sure what more it can do to dispel these feelings :-). -n -- Nathaniel J. Smith -- https://vorpus.org
On Wed, Sep 6, 2017 at 12:13 PM, Nathaniel Smith
On Wed, Sep 6, 2017 at 1:49 AM, Ivan Levkivskyi
wrote: Another comment from bystander point of view: it looks like the discussions of API design and implementation are a bit entangled here. This is much better in the current version of the PEP, but still there is a _feelling_ that some design decisions are influenced by the implementation strategy.
As I currently see the "philosophy" at large is like this: there are different level of coupling between concurrently executing code: * processes: practically not coupled, designed to be long running * threads: more tightly coupled, designed to be less long-lived, context is managed by threading.local, which is not inherited on "forking" * tasks: tightly coupled, designed to be short-lived, context will be managed by PEP 550, context is inherited on "forking"
This seems right to me.
Normal generators fall out from this "scheme", and it looks like their behavior is determined by the fact that coroutines are implemented as generators. What I think miht help is to add few more motivational examples to the design section of the PEP.
Literally the first motivating example at the beginning of the PEP ('def fractions ...') involves only generators, not coroutines, and only works correctly if generators get special handling. (In fact, I'd be curious to see how Greg's {push,pop}_local_storage could handle this case.) The implementation strategy changed radically between v1 and v2 because of considerations around generator (not coroutine) semantics. I'm not sure what more it can do to dispel these feelings :-).
Just to mention that this is now closely related to the discussion on my proposal on python-ideas. BTW, that proposal is now submitted as PEP 555 on the peps repo. ––Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
On 6 September 2017 at 11:13, Nathaniel Smith
On Wed, Sep 6, 2017 at 1:49 AM, Ivan Levkivskyi
wrote: Normal generators fall out from this "scheme", and it looks like their behavior is determined by the fact that coroutines are implemented as generators. What I think miht help is to add few more motivational examples to the design section of the PEP.
Literally the first motivating example at the beginning of the PEP ('def fractions ...') involves only generators, not coroutines.
And this is probably what confuses people. As I understand, the tasks/coroutines are among the primary motivations for the PEP, but they appear somewhere later. There are four potential ways to see the PEP: 1) Generators are broken*, and therefore coroutines are broken, we want to fix the latter therefore we fix the former. 2) Coroutines are broken, we want to fix them and let's also fix generators while we are at it. 3) Generators are broken, we want to fix them and let's also fix coroutines while we are at it. 4) Generators and coroutines are broken in similar ways, let us fix them as consistently as we can. As I understand the PEP is based on option (4), please correct me if I am wrong. Therefore maybe this should be said more straight, and maybe then we should show _in addition_ a task example in rationale, show how it is broken, and explain that they are broken in slightly different ways (since expected semantics is a bit different). -- Ivan * here and below by broken I mean "broken" (sometimes behave in non-intuitive way, and lack some functionality we would like them to have)
On Wed, Sep 6, 2017 at 12:07 AM, Greg Ewing
Yury Selivanov wrote: [..] I don't see a lot of value in trying to automagically isolate changes to global state *only* in generators.
Under PEP 550, if you want to e.g. change the decimal context temporarily in a non-generator function, you're still going to have to protect those changes using a with-statement or something equivalent. I don't see why the same thing shouldn't apply to generators.
It seems to me that it will be *more* confusing to give generators this magical ability to avoid with-statements.
Greg, just to make sure that we are talking about the same thing, could you please show an example (using the current PEP 550 API/semantics) of something that in your opinion should work differently for generators? Yury
On Wed, Sep 6, 2017 at 5:58 AM, Ivan Levkivskyi
On 6 September 2017 at 11:13, Nathaniel Smith
wrote: On Wed, Sep 6, 2017 at 1:49 AM, Ivan Levkivskyi
wrote: Normal generators fall out from this "scheme", and it looks like their behavior is determined by the fact that coroutines are implemented as generators. What I think miht help is to add few more motivational examples to the design section of the PEP.
Literally the first motivating example at the beginning of the PEP ('def fractions ...') involves only generators, not coroutines.
And this is probably what confuses people. As I understand, the tasks/coroutines are among the primary motivations for the PEP, but they appear somewhere later. There are four potential ways to see the PEP:
1) Generators are broken*, and therefore coroutines are broken, we want to fix the latter therefore we fix the former. 2) Coroutines are broken, we want to fix them and let's also fix generators while we are at it. 3) Generators are broken, we want to fix them and let's also fix coroutines while we are at it. 4) Generators and coroutines are broken in similar ways, let us fix them as consistently as we can.
Ivan, generators and coroutines are fundamentally different objects (even though they share the implementation). The only common thing is that they both allow for out of order execution of code in the same OS thread. The PEP explains the semantical difference of EC in the High-level Specification in detail, literally on the 2nd page of the PEP. I don't see any benefit in reshuffling the rationale section. Yury
On Wed, Sep 6, 2017 at 10:07 AM, Greg Ewing
Yury Selivanov wrote:
Greg, have you seen this new section: https://www.python.org/dev/peps/pep-0550/#should-yield-from- leak-context-changes
That section seems to be addressing the idea of a generator behaving differently depending on whether you use yield-from on it.
Regarding this, I think yield from should have the same semantics as iterating over the generator with next/send, and PEP 555 has no issues with this.
I never suggested that, and I'm still not suggesting it.
The bottomline is that it's easier to
reason about context when it's guaranteed that context changes are always isolated in generators no matter what.
I don't see a lot of value in trying to automagically isolate changes to global state *only* in generators.
Under PEP 550, if you want to e.g. change the decimal context temporarily in a non-generator function, you're still going to have to protect those changes using a with-statement or something equivalent. I don't see why the same thing shouldn't apply to generators.
It seems to me that it will be *more* confusing to give generators this magical ability to avoid with-statements.
Exactly. To state it clearly: PEP 555 does not have this issue. ––Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
On Wed, Sep 6, 2017 at 8:07 AM, Koos Zevenhoven
I think yield from should have the same semantics as iterating over the generator with next/send, and PEP 555 has no issues with this.
I think the onus is on you and Greg to show a realistic example that shows why this is necessary. So far all the argumentation about this has been of the form "if you have code that currently does this (example using foo) and you refactor it in using yield from (example using bar), and if you were relying on context propagation back out of calls, then it should still propagate out." This feels like a very abstract argument. I have a feeling that context state propagating out of a call is used relatively rarely -- it must work for cases where you refactor something that changes context inline into a utility function (e.g. decimal.setcontext()), but I just can't think of a realistic example where coroutines (either of the yield-from variety or of the async/def form) would be used for such a utility function. A utility function that sets context state but also makes a network call just sounds like asking for trouble! -- --Guido van Rossum (python.org/~guido)
On Wed, Sep 6, 2017 at 8:07 AM, Koos Zevenhoven
On Wed, Sep 6, 2017 at 10:07 AM, Greg Ewing
wrote: Yury Selivanov wrote:
Greg, have you seen this new section:
https://www.python.org/dev/peps/pep-0550/#should-yield-from-leak-context-cha...
That section seems to be addressing the idea of a generator behaving differently depending on whether you use yield-from on it.
Regarding this, I think yield from should have the same semantics as iterating over the generator with next/send, and PEP 555 has no issues with this.
I never suggested that, and I'm still not suggesting it.
The bottomline is that it's easier to reason about context when it's guaranteed that context changes are always isolated in generators no matter what.
I don't see a lot of value in trying to automagically isolate changes to global state *only* in generators.
Under PEP 550, if you want to e.g. change the decimal context temporarily in a non-generator function, you're still going to have to protect those changes using a with-statement or something equivalent. I don't see why the same thing shouldn't apply to generators.
It seems to me that it will be *more* confusing to give generators this magical ability to avoid with-statements.
Exactly. To state it clearly: PEP 555 does not have this issue.
It would be great if you or Greg could show a couple of real-world examples showing the "issue" (with the current PEP 550 APIs/semantics). PEP 550 treats coroutines and generators as objects that support out of order execution. OS threads are similar to them in some ways. I find it questionable to try to enforce context management rules we have for regular functions to generators/coroutines. I don't really understand the "refactoring" argument you and Greg are talking about all the time. PEP 555 still doesn't clearly explain how exactly it is different from PEP 550. Because 555 was posted *after* 550, I think that it's PEP 555 that should have that comparison. Yury
On Wed, Sep 6, 2017 at 8:16 PM, Guido van Rossum
On Wed, Sep 6, 2017 at 8:07 AM, Koos Zevenhoven
wrote: I think yield from should have the same semantics as iterating over the generator with next/send, and PEP 555 has no issues with this.
I think the onus is on you and Greg to show a realistic example that shows why this is necessary.
Well, regarding this part, it's just that things like for obj in gen: yield obj often get modernized into yield from gen And realistic examples of that include pretty much any normal use of yield from. So far all the argumentation about this has been of the form "if you have
code that currently does this (example using foo) and you refactor it in using yield from (example using bar), and if you were relying on context propagation back out of calls, then it should still propagate out."
So here's a realistic example, with the semantics of PEP 550 applied to a decimal.setcontext() kind of thing, but it could be anything using var.set(value): def process_data_buffers(buffers): setcontext(default_context) for buf in buffers: for data in buf: if data.tag == "NEW_PRECISION": setcontext(context_based_on(data)) else: yield compute(data) Code smells? Yes, but maybe you often see much worse things, so let's say it's fine. But then, if you refactor it into a subgenerator like this: def process_data_buffer(buffer): for data in buf: if data.tag == "NEW_PRECISION": setcontext(context_based_on(data)) else: yield compute(data) def process_data_buffers(buffers): setcontext(default_context) for buf in buffers: yield from buf Now, if setcontext uses PEP 550 semantics, the refactoring broke the code, because a generator introduce a scope barrier by adding a LogicalContext on the stack, and setcontext is only local to the process_data_buffer subroutine. But the programmer is puzzled, because with regular functions it had worked just fine in a similar situation before they learned about generators: def process_data_buffer(buffer, output): for data in buf: if data.tag == "precision change": setcontext(context_based_on(data)) else: output.append(compute(data)) def process_data_buffers(buffers): output = [] setcontext(default_context) for buf in buffers: process_data_buffer(buf, output) In fact, this code had another problem, namely that the context state is leaked out of process_data_buffers, because PEP 550 leaks context state out of functions, but not out of generators. But we can easily imagine that the unit tests for process_data_buffers *do* pass. But let's look at a user of the functionality: def get_total(): return sum(process_data_buffers(get_buffers())) setcontext(somecontext) value = get_total() * compute_factor() Now the code is broken, because setcontext(somecontext) has no effect, because get_total() leaks out another context. Not to mention that our data buffer source now has control over the behavior of compute_factor(). But if one is lucky, the last line was written as value = compute_factor() * get_total() And hooray, the code works! (Except for perhaps the code that is run after this.) Now this was of course a completely fictional example, and hopefully I didn't introduce any bugs or syntax errors other than the ones I described. I haven't seen code like this anywhere, but somehow we caught the problems anyway. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
On Wed, Sep 6, 2017 at 8:22 PM, Yury Selivanov
PEP 550 treats coroutines and generators as objects that support out of order execution.
Out of order? More like interleaved.
PEP 555 still doesn't clearly explain how exactly it is different from PEP 550. Because 555 was posted *after* 550, I think that it's PEP 555 that should have that comparison.
555 was *posted* as a pep after 550, yes. And yes, there could be a comparison, especially now that PEP 550 semantics seem to have converged, so PEP 555 does not have to adapt the comparison to PEP 550 changes. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
On Wed, Sep 6, 2017 at 1:39 PM, Koos Zevenhoven
On Wed, Sep 6, 2017 at 8:16 PM, Guido van Rossum
wrote: On Wed, Sep 6, 2017 at 8:07 AM, Koos Zevenhoven
wrote: I think yield from should have the same semantics as iterating over the generator with next/send, and PEP 555 has no issues with this.
I think the onus is on you and Greg to show a realistic example that shows why this is necessary.
Well, regarding this part, it's just that things like
for obj in gen: yield obj
often get modernized into
yield from gen
I know that that's the pattern, but everybody just shows the same foo/bar example.
And realistic examples of that include pretty much any normal use of yield from.
There aren't actually any "normal" uses of yield from. The vast majority of uses of yield from are in coroutines written using yield from.
So far all the argumentation about this has been of the form "if you have
code that currently does this (example using foo) and you refactor it in using yield from (example using bar), and if you were relying on context propagation back out of calls, then it should still propagate out."
So here's a realistic example, with the semantics of PEP 550 applied to a decimal.setcontext() kind of thing, but it could be anything using var.set(value):
def process_data_buffers(buffers): setcontext(default_context) for buf in buffers: for data in buf: if data.tag == "NEW_PRECISION": setcontext(context_based_on(data)) else: yield compute(data)
Code smells? Yes, but maybe you often see much worse things, so let's say it's fine.
But then, if you refactor it into a subgenerator like this:
def process_data_buffer(buffer): for data in buf: if data.tag == "NEW_PRECISION": setcontext(context_based_on(data)) else: yield compute(data)
def process_data_buffers(buffers): setcontext(default_context) for buf in buffers: yield from buf
Now, if setcontext uses PEP 550 semantics, the refactoring broke the code, because a generator introduce a scope barrier by adding a LogicalContext on the stack, and setcontext is only local to the process_data_buffer subroutine. But the programmer is puzzled, because with regular functions it had worked just fine in a similar situation before they learned about generators:
def process_data_buffer(buffer, output): for data in buf: if data.tag == "precision change": setcontext(context_based_on(data)) else: output.append(compute(data))
def process_data_buffers(buffers): output = [] setcontext(default_context) for buf in buffers: process_data_buffer(buf, output)
In fact, this code had another problem, namely that the context state is leaked out of process_data_buffers, because PEP 550 leaks context state out of functions, but not out of generators. But we can easily imagine that the unit tests for process_data_buffers *do* pass.
But let's look at a user of the functionality:
def get_total(): return sum(process_data_buffers(get_buffers()))
setcontext(somecontext) value = get_total() * compute_factor()
Now the code is broken, because setcontext(somecontext) has no effect, because get_total() leaks out another context. Not to mention that our data buffer source now has control over the behavior of compute_factor(). But if one is lucky, the last line was written as
value = compute_factor() * get_total()
And hooray, the code works!
(Except for perhaps the code that is run after this.)
Now this was of course a completely fictional example, and hopefully I didn't introduce any bugs or syntax errors other than the ones I described. I haven't seen code like this anywhere, but somehow we caught the problems anyway.
Yeah, so my claim this is simply a non-problem, and you've pretty much just proved that by failing to come up with pointers to actual code that would suffer from this. Clearly you're not aware of any such code. -- --Guido van Rossum (python.org/~guido)
Ivan Levkivskyi wrote:
Normal generators fall out from this "scheme", and it looks like their behavior is determined by the fact that coroutines are implemented as generators.
This is what I disagree with. Generators don't implement coroutines, they implement *parts* of coroutines. We want "task local storage" that behaves analogously to thread local storage. But PEP 550 as it stands doesn't give us that; it gives something more like "function local storage" for certain kinds of function. -- Greg
On Wed, Sep 6, 2017 at 1:39 PM, Koos Zevenhoven
Now this was of course a completely fictional example, and hopefully I didn't introduce any bugs or syntax errors other than the ones I described. I haven't seen code like this anywhere, but somehow we caught the problems anyway.
Thank you for the example, Koos. FWIW I agree it is a "completely fictional example". There are two ways how we can easily adapt PEP 550 to follow your semantics: 1. Set gen.__logical_context__ to None when it is being 'yield frommmed' 2. Merge gen.__logical_context__ with the outer LC when the generator is iterated to the end. But I still really dislike the examples you and Greg show to us. They are not typical or real-world examples, they are showcases of ways to abuse contexts. I still think that giving Python programmers one strong rule: "context mutation is always isolated in generators" makes it easier to reason about the EC and write maintainable code. Yury
Nathaniel Smith wrote:
Literally the first motivating example at the beginning of the PEP ('def fractions ...') involves only generators, not coroutines, and only works correctly if generators get special handling. (In fact, I'd be curious to see how Greg's {push,pop}_local_storage could handle this case.)
I've given a decimal-based example, but it was a bit scattered. Here's a summary and application to the fractions example. I'm going to assume that the decimal module has been modified to keep the current context in a context var, and that getcontext() and setcontext() access that context var. THe decimal.localcontext context manager is also redefined as: class localcontext(): def __enter__(self): push_local_context() ctx = getcontext().copy() setcontext(ctx) return ctx def __exit__(self): pop_local_context() Now we can write the fractions generator as: def fractions(precision, x, y): with decimal.localcontext() as ctx: ctx.prec = precision yield Decimal(x) / Decimal(y) yield Decimal(x) / Decimal(y ** 2) You may notice that this is exactly the same as what you would write today for the same task... -- Greg
Nathaniel Smith wrote:
The implementation strategy changed radically between v1 and v2 because of considerations around generator (not coroutine) semantics. I'm not sure what more it can do to dispel these feelings :-).
I can't say the changes have dispelled any feelings on my part. The implementation suggested in the PEP seems very complicated and messy. There are garbage collection issues, which it proposes using weak references to mitigate. There is also apparently some issue with long chains building up and having to be periodically collapsed. None of this inspires confidence that we have the basic design right. My approach wouldn't have any of those problems. The implementation would be a lot simpler. -- Greg
On Wed, Sep 6, 2017 at 5:00 PM, Greg Ewing
Nathaniel Smith wrote:
Literally the first motivating example at the beginning of the PEP ('def fractions ...') involves only generators, not coroutines, and only works correctly if generators get special handling. (In fact, I'd be curious to see how Greg's {push,pop}_local_storage could handle this case.)
I've given a decimal-based example, but it was a bit scattered. Here's a summary and application to the fractions example.
I'm going to assume that the decimal module has been modified to keep the current context in a context var, and that getcontext() and setcontext() access that context var.
THe decimal.localcontext context manager is also redefined as:
class localcontext():
def __enter__(self): push_local_context() ctx = getcontext().copy() setcontext(ctx) return ctx
def __exit__(self): pop_local_context()
1. So essentially this means that we will have one "local context" per context manager storing one value. 2. If somebody makes a mistake and calls "push_local_context" without a corresponding "pop_local_context" -- you will have an unbounded growth of LCs (happen's in Koos' proposal too, btw). 3. Users will need to know way more to correctly use the mechanism. So far, both you and Koos can't give us a realistic example which illustrates why we should suffer the implications of (1), (2), and (3). Yury
On Wed, Sep 6, 2017 at 4:27 PM, Greg Ewing
Ivan Levkivskyi wrote:
Normal generators fall out from this "scheme", and it looks like their behavior is determined by the fact that coroutines are implemented as generators.
This is what I disagree with. Generators don't implement coroutines, they implement *parts* of coroutines.
We want "task local storage" that behaves analogously to thread local storage. But PEP 550 as it stands doesn't give us that; it gives something more like "function local storage" for certain kinds of function.
The PEP gives you a Task Local Storage, where Task is: 1. your single-threaded code 2. a generator 3. an async task If you correctly use context managers, PEP 550 works intuitively and similar to how one would think that threading.local() should work. The only example you (and Koos) can come up with is this: def generator(): set_decimal_context() yield next(generator()) # decimal context is not set # or yield from generator() # decimal context is still not set I consider that the above is a feature. Yury
On Wed, Sep 6, 2017 at 5:06 PM, Greg Ewing
Nathaniel Smith wrote:
The implementation strategy changed radically between v1 and v2 because of considerations around generator (not coroutine) semantics. I'm not sure what more it can do to dispel these feelings :-).
I can't say the changes have dispelled any feelings on my part.
The implementation suggested in the PEP seems very complicated and messy. There are garbage collection issues, which it proposes using weak references to mitigate.
"messy" and "complicated" doesn't sound like a valuable feedback :( There are no "garbage collection issues", sorry. The issue that we use weak references for is the same issue why threading.local() uses them: def foo(): var = ContextVar() var.set(1) for _ in range(10**6): foo() If 'var' is strongly referenced, we would have a bunch of them.
There is also apparently some issue with long chains building up and having to be periodically collapsed. None of this inspires confidence that we have the basic design right.
My approach wouldn't have any of those problems. The implementation would be a lot simpler.
Cool. Yury
On Wednesday, September 6, 2017 8:06:36 PM EDT Greg Ewing wrote:
Nathaniel Smith wrote:
The implementation strategy changed radically between v1 and v2 because of considerations around generator (not coroutine) semantics. I'm not sure what more it can do to dispel these feelings> :-).
I can't say the changes have dispelled any feelings on my part.
The implementation suggested in the PEP seems very complicated and messy. There are garbage collection issues, which it proposes using weak references to mitigate. There is also apparently some issue with long chains building up and having to be periodically collapsed. None of this inspires confidence that we have the basic design right.
My approach wouldn't have any of those problems. The implementation would be a lot simpler.
I might have missed something, but your claim doesn't make any sense to me. All you've proposed is to replace the implicit and guaranteed push_lc()/pop_lc() around each generator with explicit LC stack management. You *still* need to retain and switch the current stack on every generator send() and throw(). Everything else written out in PEP 550 stays relevant as well. As for the "long chains building up", your approach is actually much worse. The absense of a guaranteed context fence around generators would mean that contextvar context managers will *have* to push LCs whether really needed or not. Consider the following (naive) way of computing the N-th Fibonacci number: def fib(n): with decimal.localcontext(): if n == 0: return 0 elif n == 1: return 1 else: return fib(n - 1) + fib(n - 2) Your proposal can cause the LC stack to grow incessantly even in simple cases, and will affect code that doesn't even use generators. A great deal of effort was put into PEP 550, and the matter discussed is far from trivial. What you see as "complicated and messy" is actually the result of us carefully considering the solutions to real- world problems, and then the implications of those solutions (including the worst-case scenarios.) Elvis
Guido van Rossum wrote:
This feels like a very abstract argument. I have a feeling that context state propagating out of a call is used relatively rarely -- it must work for cases where you refactor something that changes context inline into a utility function (e.g. decimal.setcontext()), but I just can't think of a realistic example where coroutines (either of the yield-from variety or of the async/def form) would be used for such a utility function.
Yuri has already found one himself, the __aenter__ and __aexit__ methods of an async context manager.
A utility function that sets context state but also makes a network call just sounds like asking for trouble!
I'm coming from the other direction. It seems to me that it's not very useful to allow with-statements to be skipped in certain very restricted circumstances. The only situation in which you will be able to take advantage of this is if the context change is being made in a generator or coroutine, and it is to apply to the whole body of that generator or coroutine. If you're in an ordinary function, you'll still have to use a context manager. If you only want the change to apply to part of the body, you'll still have to use a context manager. It would be simpler to just tell people to always use a context manager, wouldn't it? -- Greg
Yury Selivanov wrote:
It would be great if you or Greg could show a couple of real-world examples showing the "issue" (with the current PEP 550 APIs/semantics).
Here's one way that refactoring could trip you up. Start with this: async def foo(): calculate_something() #in a coroutine, so we can be lazy and not use a cm ctx = decimal.getcontext().copy() ctx.prec = 5 decimal.setcontext(ctx) calculate_something_else() And factor part of it out (into an *ordinary* function!) async def foo(): calculate_something() calculate_something_else_with_5_digits() def calculate_something_else_with_5_digits(): ctx = decimal.getcontext().copy() ctx.prec = 5 decimal.setcontext(ctx) calculate_something_else() Now we add some more calculation to the end of foo(): async def foo(): calculate_something() calculate_something_else_with_5_digits() calculate_more_stuff() Here we didn't intend calculate_more_stuff() to be done with prec=5, but we forgot that calculate_something_else_ with_5_digits() changes the precision and *doesn't restore it* because we didn't add a context manager to it. If we hadn't been lazy and had used a context manager in the first place, that wouldn't have happened. Summary: I think that skipping context managers in some circumstances is a bad habit that shouldn't be encouraged. -- Greg
On Wed, Sep 6, 2017 at 11:26 PM, Greg Ewing
Guido van Rossum wrote:
This feels like a very abstract argument. I have a feeling that context state propagating out of a call is used relatively rarely -- it must work for cases where you refactor something that changes context inline into a utility function (e.g. decimal.setcontext()), but I just can't think of a realistic example where coroutines (either of the yield-from variety or of the async/def form) would be used for such a utility function.
Yuri has already found one himself, the __aenter__ and __aexit__ methods of an async context manager.
__aenter__ is not a generator and there's no 'yield from' there. Coroutines (within an async task) leak state just like regular functions (within a thread). Your argument is to allow generators to leak context changes (right?). AFAIK we don't use generators to implement __enter__ or __aenter__ (generators decorated with @types.coroutine or @asyncio.coroutine are coroutines, according to PEP 492). So this is irrelevant.
A utility function that sets context state but also makes a network call just sounds like asking for trouble!
I'm coming from the other direction. It seems to me that it's not very useful to allow with-statements to be skipped in certain very restricted circumstances.
Can you clarify what do you mean by "with-statements to be skipped"? This language is not used in PEP 550 or in Python documentation. I honestly don't understand what it means.
The only situation in which you will be able to take advantage of this is if the context change is being made in a generator or coroutine, and it is to apply to the whole body of that generator or coroutine.
If you're in an ordinary function, you'll still have to use a context manager. If you only want the change to apply to part of the body, you'll still have to use a context manager.
It would be simpler to just tell people to always use a context manager, wouldn't it?
Yes, PEP 550 wants people to always use a context managers! Which will work as you expect them to work for coroutines, generators, and regular functions. At this point I suspect you have some wrong idea about some specification detail of PEP 550. I understand what Koos is talking about, but I really don't follow you. Using the "with-statements to be skipped" language is very confusing and doesn't help to understand you. Yury
On Wed, Sep 6, 2017 at 11:39 PM, Greg Ewing
Yury Selivanov wrote:
It would be great if you or Greg could show a couple of real-world examples showing the "issue" (with the current PEP 550 APIs/semantics).
Here's one way that refactoring could trip you up. Start with this:
async def foo(): calculate_something() #in a coroutine, so we can be lazy and not use a cm ctx = decimal.getcontext().copy() ctx.prec = 5 decimal.setcontext(ctx) calculate_something_else()
And factor part of it out (into an *ordinary* function!)
async def foo(): calculate_something() calculate_something_else_with_5_digits()
def calculate_something_else_with_5_digits(): ctx = decimal.getcontext().copy() ctx.prec = 5 decimal.setcontext(ctx) calculate_something_else()
Now we add some more calculation to the end of foo():
async def foo(): calculate_something() calculate_something_else_with_5_digits() calculate_more_stuff()
Here we didn't intend calculate_more_stuff() to be done with prec=5, but we forgot that calculate_something_else_ with_5_digits() changes the precision and *doesn't restore it* because we didn't add a context manager to it.
If we hadn't been lazy and had used a context manager in the first place, that wouldn't have happened.
Summary: I think that skipping context managers in some circumstances is a bad habit that shouldn't be encouraged.
-- Greg _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com
Guido van Rossum wrote:
Yeah, so my claim this is simply a non-problem, and you've pretty much just proved that by failing to come up with pointers to actual code that would suffer from this. Clearly you're not aware of any such code.
In response I'd ask Yuri to come up with examples of real code that would benefit significantly from being able to make context changes without wrapping them in a with statement. -- Greg
On Wed, Sep 6, 2017 at 11:39 PM, Greg Ewing
Yury Selivanov wrote:
It would be great if you or Greg could show a couple of real-world examples showing the "issue" (with the current PEP 550 APIs/semantics).
Here's one way that refactoring could trip you up. Start with this:
async def foo(): calculate_something() #in a coroutine, so we can be lazy and not use a cm
Where exactly does PEP 550 encourage users to be "lazy and not use a cm"? PEP 550 provides a mechanism for implementing context managers! What is this example supposed to show?
ctx = decimal.getcontext().copy() ctx.prec = 5 decimal.setcontext(ctx) calculate_something_else()
And factor part of it out (into an *ordinary* function!)
async def foo(): calculate_something() calculate_something_else_with_5_digits()
def calculate_something_else_with_5_digits(): ctx = decimal.getcontext().copy() ctx.prec = 5 decimal.setcontext(ctx) calculate_something_else()
Now we add some more calculation to the end of foo():
async def foo(): calculate_something() calculate_something_else_with_5_digits() calculate_more_stuff()
Here we didn't intend calculate_more_stuff() to be done with prec=5, but we forgot that calculate_something_else_ with_5_digits() changes the precision and *doesn't restore it* because we didn't add a context manager to it.
If we hadn't been lazy and had used a context manager in the first place, that wouldn't have happened.
How is PEP 550 is at fault of somebody being lazy and not using a context manager? PEP 550 has a hard requirement to make it possible for decimal/other libraries to start using its APIs and stay backwards compatible, so it allows `decimal.setcontext(ctx)` function to be implemented. We are fixing things here. When you are designing a new library/API, you can use CMs and only CMs. It's up to you, as a library author, PEP 550 does not limit you. And when you use CMs, there's no "problems" with 'yield from' or anything in PEP 550.
Summary: I think that skipping context managers in some circumstances is a bad habit that shouldn't be encouraged.
PEP 550 does not encourage coding without context managers. It does, in fact, solve the problem of reliably storing context to make writing context managers possible. To reiterate: it provides mechanism to set a variable within the current logical thread, like storing a current request in an async HTTP handler. Or to implement `decimal.setcontext`. But you are free to use it to only implement context managers in your library. Yury
Yury Selivanov wrote:
I still think that giving Python programmers one strong rule: "context mutation is always isolated in generators" makes it easier to reason about the EC and write maintainable code.
Whereas I think it makes code *harder* to reason about, because to take advantage of it you need to be acutely aware of whether the code you're working on is in a generator/coroutine or not. It seems simpler to me to have one rule for all kinds of functions: If you're making a temporary change to contextual state, always encapsulate it in a with statement. -- Greg
On Wed, Sep 6, 2017 at 11:55 PM, Greg Ewing
Guido van Rossum wrote:
Yeah, so my claim this is simply a non-problem, and you've pretty much just proved that by failing to come up with pointers to actual code that would suffer from this. Clearly you're not aware of any such code.
In response I'd ask Yuri to come up with examples of real code that would benefit significantly from being able to make context changes without wrapping them in a with statement.
A real-code example: make it possible to implement decimal.setcontext() on top of PEP 550 semantics. I still feel that there's some huge misunderstanding in the discussion: PEP 550 does not promote "not using context managers". It simply implements a low-level mechanism to make it possible to implement context managers for generators/coroutines/etc. Whether this API is used to write context managers or not is completely irrelevant to the discussion. How does threading.local() promote or demote using of context managers? The answer: it doesn't. Same answer is for PEP 550, which is a similar mechanism. Yury
Yury Selivanov wrote:
1. So essentially this means that we will have one "local context" per context manager storing one value.
I can't see that being a major problem. Context vars will (I hope!) be very rare things, and needing to change a bunch of them in one function ought to be rarer still. But if you do, it would be easy to provide a context manager whose sole effect is to introduce a new context: with new_local_context(): cvar1.set(something) cvar2.set(otherthing) ...
2. If somebody makes a mistake and calls "push_local_context" without a corresponding "pop_local_context"
You wouldn't normally call them directly, they would be encapsulated in carefully-written context managers. If you do use them, you're taking responsibility for using them correctly. If it would make you feel happier, they could be named _push_local_context and _pop_local_context to emphasise that they're not intended for everyday use.
3. Users will need to know way more to correctly use the mechanism.
Most users will simply be using already-provided context managers, which they're *already used to doing*. So they won't have to know anything more than they already do. See my last decimal example, which required *no change* to existing correct user code.
So far, both you and Koos can't give us a realistic example which illustrates why we should suffer the implications of (1), (2), and (3).
And you haven't given a realistic example that convinces me your proposed with-statement-elimination feature would be of significant benefit. -- Greg
Yury Selivanov wrote:
The PEP gives you a Task Local Storage, where Task is:
1. your single-threaded code 2. a generator 3. an async task
If you correctly use context managers, PEP 550 works intuitively and similar to how one would think that threading.local() should work.
My version works *more* similarly to thread-local storage, IMO. Currently, if you change the decimal context without using a with-statement or something equivalent, you *don't* expect the change to be confined to the current function or sub-generator or async sub-task. All I'm asking for is one consistent rule: If you want a context change encapsulated, use a with-statement. If you don't, don't. Not only is this rule simpler than yours, it's the *same* rule that we have now, so there is less for users to learn. -- Greg
Yury Selivanov wrote:
def foo(): var = ContextVar() var.set(1)
for _ in range(10**6): foo()
If 'var' is strongly referenced, we would have a bunch of them.
Erk. This is not how I envisaged context vars would be used. What I thought you would do is this: my_context_var = ContextVar() def foo(): my_context_var.set(1) This problem would also not arise if context vars simply had names instead of being magic key objects: def foo(): contextvars.set("mymodule.myvar", 1) That's another thing I think would be an improvement, but it's orthogonal to what we're talking about here and would be best discussed separately. -- Greg
Yury Selivanov wrote:
I understand what Koos is talking about, but I really don't follow you. Using the "with-statements to be skipped" language is very confusing and doesn't help to understand you.
If I understand correctly, instead of using a context manager, your fractions example could be written like this: def fractions(precision, x, y): ctx = decimal.getcontext().copy() decimal.setcontext(ctx) ctx.prec = precision yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y ** 2) and it would work without leaking changes to the decimal context, despite the fact that it doesn't use a context manager or do anything else to explicitly put back the old context. Am I right about that? This is what I mean by "skipping context managers" -- that it's possible in some situations to get by without using a context manager, by taking advantage of the implicit local context push that happens whenever a generator is started up. Now, there are two possibilities: 1) You take advantage of this, and don't use context managers in some or all of the places where you don't need to. You seem to agree that this would be a bad idea. 2) You ignore it and always use a context manager, in which case it's not strictly necessary for the implicit context push to occur, since the relevant context managers can take care of it. So there doesn't seem to be any great advantage to the automatic context push, and it has some disadvantages, such as yield-from not quite working as expected in some situations. Also, it seems that every generator is going to incur the overhead of allocating a logical_context even when it doesn't actually change any context vars, which most generators won't. -- Greg
On Thu, Sep 7, 2017 at 10:54 AM, Greg Ewing
Yury Selivanov wrote:
def foo(): var = ContextVar() var.set(1)
for _ in range(10**6): foo()
If 'var' is strongly referenced, we would have a bunch of them.
Erk. This is not how I envisaged context vars would be used. What I thought you would do is this:
my_context_var = ContextVar()
def foo(): my_context_var.set(1)
This problem would also not arise if context vars simply had names instead of being magic key objects:
def foo(): contextvars.set("mymodule.myvar", 1)
That's another thing I think would be an improvement, but it's orthogonal to what we're talking about here and would be best discussed separately.
There are lots of things in this discussion that I should have commented on, but here's one related to this. PEP 555 does not have the resource-management issue described above and needs no additional tricks to achieve that: # using PEP 555 def foo(): var = contextvars.Var() with var.assign(1): # do something [*] for _ in range(10**6): foo() Every time foo is called, a new context variable is created, but that's perfectly fine, and lightweight. As soon as the context manager exits, there are no references to the Assignment object returned by var.assign(1), and as soon as foo() returns, there are no references to var, so everything should get cleaned up nicely. And regarding string keys, they have pros and cons, and they can be added easily, so let's not go there now. -- Koos [*] (nit-picking) without closures that would keep the var reference alive -- + Koos Zevenhoven + http://twitter.com/k7hoven +
There is one thing I misunderstood. Since generators and coroutines are almost exactly the same underneath, I had thought that the automatic logical_context creation for generators was also going to apply to coroutines, but from reading the PEP again it seems that's not the case. Somehow I missed that the first time. Sorry about that. So, context vars do behave like "task locals storage" for asyncio Tasks, which is good. The only issue is whether a generator should be considered an "ad-hoc task" for this purpose. I can see your reasons for thinking that it should be. I can also understand your thinking that the yield-from issue is such an obscure corner case that it's not worth worrying about, especially since there is a workaround available (setting _logical_context_ to None) if needed. I'm not sure how I feel about that now. I agree that it's an obscure case, but the workaround seems even more obscure, and is unlikely to be found by anyone who isn't closely familiar with the inner workings. I think I'd be happier if there were a higher-level way of applying this workaround, such as a decorator: @subgenerator def g(): ... Then the docs could say "If you want a generator to *not* have its own task local storage, wrap it with @subgenerator." By the way, I think "Task Local Storage" would be a much better title for this PEP. It instantly conveys the basic idea in a way that "Execution Context" totally fails to do. It might also serve as a source for some better terminology for parts of the implementation, such as TaskLocalStorage and TaskLocalStorageStack instead of logical_context and execution_context. I found the latter terms almost devoid of useful meaning when trying to understand the implementation. -- Greg
There are a couple of things in the PEP I'm confused about: 1) Under "Generators" it says: once set in the generator, the context variable is guaranteed not to change between iterations; This suggests that you're not allowed to set() a given context variable more than once in a given generator, but some of the examples seem to contradict that. So I'm not sure what this is trying to say. 2) I don't understand why the logical_contexts have to be immutable. If every task or generator that wants its own task-local storage has its own logical_context instance, why can't it be updated in-place? -- Greg
On 09/07/2017 04:39 AM, Greg Ewing wrote:
1) Under "Generators" it says:
once set in the generator, the context variable is guaranteed not to change between iterations;
This suggests that you're not allowed to set() a given context variable more than once in a given generator, but some of the examples seem to contradict that. So I'm not sure what this is trying to say.
I believe I can answer this part: the guarantee is that - the context variable will not be changed while the yield is in effect -- or, said another way, while the generator is suspended; - the context variable will not be changed by subgenerators - the context variable /may/ be changed by normal functions/class methods (since calling them would be part of the iteration) -- ~Ethan~
On 09/07/2017 03:37 AM, Greg Ewing wrote:
If I understand correctly, instead of using a context manager, your fractions example could be written like this:
def fractions(precision, x, y): ctx = decimal.getcontext().copy() decimal.setcontext(ctx) ctx.prec = precision yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y ** 2)
and it would work without leaking changes to the decimal context, despite the fact that it doesn't use a context manager or do anything else to explicitly put back the old context.
The disagreement seems to be whether a LogicalContext should be created implicitly vs explicitly (or opt-out vs opt-in). As a user trying to track down a decimal context change not propagating, I would not suspect the above code of automatically creating a LogicalContext and isolating the change, whereas Greg's context manager version is abundantly clear. The implicit vs explicit argument comes down, I think, to resource management: some resources in Python are automatically managed (memory), and some are not (files) -- which type should LCs be? -- ~Ethan~
On 09/06/2017 11:57 PM, Yury Selivanov wrote:
On Wed, Sep 6, 2017 at 11:39 PM, Greg Ewing wrote:
Here's one way that refactoring could trip you up. Start with this:
async def foo(): calculate_something() #in a coroutine, so we can be lazy and not use a cm
Where exactly does PEP 550 encourage users to be "lazy and not use a cm"? PEP 550 provides a mechanism for implementing context managers! What is this example supposed to show?
That using a CM is not required, and tracking down a bug caused by not using a CM can be difficult.
How is PEP 550 is at fault of somebody being lazy and not using a context manager?
Because PEP 550 makes a CM unnecessary in the simple (common?) case, hiding the need for a CM in not-so-simple cases. For comparison: in Python 3 we are now warned about files that have been left open (because explicitly closing files was unnecessary in CPython due to an implementation detail) -- the solution? make files context managers whose __exit__ closes the file.
PEP 550 has a hard requirement to make it possible for decimal/other libraries to start using its APIs and stay backwards compatible, so it allows `decimal.setcontext(ctx)` function to be implemented. We are fixing things here.
I appreciate that the scientific and number-crunching communities have been a major driver of enhancements for Python (such as rich comparisons and, more recently, matrix operators), but I don't think an enhancement for them that makes life more difficult for the rest is a net win. -- ~Ethan~
On Thursday, September 7, 2017 9:05:58 AM EDT Ethan Furman wrote:
The disagreement seems to be whether a LogicalContext should be created implicitly vs explicitly (or opt-out vs opt-in). As a user trying to track down a decimal context change not propagating, I would not suspect the above code of automatically creating a LogicalContext and isolating the change, whereas Greg's context manager version is abundantly clear.
The implicit vs explicit argument comes down, I think, to resource management: some resources in Python are automatically managed (memory), and some are not (files) -- which type should LCs be?
You are confusing resource management with the isolation mechanism. PEP 550 contextvars are analogous to threading.local(), which the PEP makes very clear from the outset. threading.local(), the isolation mechanism, is *implicit*. decimal.localcontext() is an *explicit* resource manager that relies on threading.local() magic. PEP 550 simply provides a threading.local() alternative that works in tasks and generators. That's it! Elvis
On Thursday, September 7, 2017 3:54:15 AM EDT Greg Ewing wrote:
This problem would also not arise if context vars simply had names instead of being magic key objects:
def foo(): contextvars.set("mymodule.myvar", 1)
That's another thing I think would be an improvement, but it's orthogonal to what we're talking about here and would be best discussed separately.
On the contrary, using simple names (PEP 550 V1 was actually doing that) is a regression. It opens up namespace clashing issues. Imagine you have a variable named "foo", and then some library you import also decides to use the name "foo", what then? That's one of the reasons why we do `local = threading.local()` instead of `threading.set_local("foo", 1)`. Elvis
On 09/07/2017 06:41 AM, Elvis Pranskevichus wrote:
On Thursday, September 7, 2017 9:05:58 AM EDT Ethan Furman wrote:
The disagreement seems to be whether a LogicalContext should be created implicitly vs explicitly (or opt-out vs opt-in). As a user trying to track down a decimal context change not propagating, I would not suspect the above code of automatically creating a LogicalContext and isolating the change, whereas Greg's context manager version is abundantly clear.
The implicit vs explicit argument comes down, I think, to resource management: some resources in Python are automatically managed (memory), and some are not (files) -- which type should LCs be?
You are confusing resource management with the isolation mechanism. PEP 550 contextvars are analogous to threading.local(), which the PEP makes very clear from the outset.
I might be, and I wouldn't be surprised. :) On the other hand, one can look at isolation as being a resource.
threading.local(), the isolation mechanism, is *implicit*.
I don't think so. You don't get threading.local() unless you call it -- that makes it explicit.
decimal.localcontext() is an *explicit* resource manager that relies on threading.local() magic. PEP 550 simply provides a threading.local() alternative that works in tasks and generators. That's it!
The concern is *how* PEP 550 provides it: - explicitly, like threading.local(): has to be set up manually, preferably with a context manager - implicitly: it just happens under certain conditions -- ~Ethan~
On Thursday, September 7, 2017 6:37:58 AM EDT Greg Ewing wrote:
2) You ignore it and always use a context manager, in which case it's not strictly necessary for the implicit context push to occur, since the relevant context managers can take care of it.
So there doesn't seem to be any great advantage to the automatic context push, and it has some disadvantages, such as yield-from not quite working as expected in some situations.
The advantage is that context managers don't need to *always* allocate and push an LC. [1]
Also, it seems that every generator is going to incur the overhead of allocating a logical_context even when it doesn't actually change any context vars, which most generators won't.
By default, generators reference an empty LogicalContext object that is allocated once (like the None object). We can do that because LCs are immutable. Elvis [1] https://mail.python.org/pipermail/python-dev/2017-September/ 149265.html
On Thursday, September 7, 2017 10:06:14 AM EDT Ethan Furman wrote:
I might be, and I wouldn't be surprised. :) On the other hand, one can look at isolation as being a resource.
threading.local(), the isolation mechanism, is *implicit*.
I don't think so. You don't get threading.local() unless you call it -- that makes it explicit.
decimal.localcontext() is an *explicit* resource manager that relies on threading.local() magic. PEP 550 simply provides a threading.local() alternative that works in tasks and generators. That's it!
The concern is *how* PEP 550 provides it:
- explicitly, like threading.local(): has to be set up manually, preferably with a context manager
- implicitly: it just happens under certain conditions
You literally replace threading.local() with contextvars.ContextVar(): import threading _decimal_context = threading.local() def set_decimal_context(ctx): _decimal_context.context = ctx Becomes: import contextvars _decimal_context = contextvars.ContextVar('decimal.Context') def set_decimal_context(ctx): _decimal_context.set(ctx) Elvis
I write it in a new thread, but I also want to write it here -- I need a time out in this discussion so I can think about it more. -- --Guido van Rossum (python.org/~guido)
On Thu, Sep 07, 2017 at 09:41:10AM -0400, Elvis Pranskevichus wrote:
threading.local(), the isolation mechanism, is *implicit*. decimal.localcontext() is an *explicit* resource manager that relies on threading.local() magic. PEP 550 simply provides a threading.local() alternative that works in tasks and generators. That's it!
If there only were a name that would make it explicit, like TaskLocalStorage. ;) Seriously, the problem with 'context' is that it is: a) A predefined set of state values like in the Decimal (I think also the OpenSSL) context. But such a context is put inside another context (the ExecutionContext). b) A theoretical concept from typed Lambda calculus (in the context 'gamma' the variable 'v' has type 't'). But this concept would be associated with lexical scope and would extend to functions (not only tasks and generators). c) ``man 3 setcontext``. A replacement for setjmp/longjmp. Somewhat related in that it could be used to implement coroutines. d) The .NET flowery language. I do did not fully understand what the .NET ExecutionContext and its 2881 implicit flow rules are. ... Stefan Krah
Elvis Pranskevichus wrote:
By default, generators reference an empty LogicalContext object that is allocated once (like the None object). We can do that because LCs are immutable.
Ah, I see. That wasn't clear from the implementation, where gen.__logical_context__ = contextvars.LogicalContext() looks like it's creating a new one. However, there's another thing: it looks like every time a generator is resumed/suspended, an execution context node is created/discarded. -- Greg
On 7 September 2017 at 07:06, Ethan Furman
The concern is *how* PEP 550 provides it:
- explicitly, like threading.local(): has to be set up manually, preferably with a context manager
- implicitly: it just happens under certain conditions
A recurring point of confusion with the threading.local() analogy seems to be that there are actually *two* pieces to that analogy: * threading.local() <-> contextvars.ContextVar * PyThreadState_GetDict() <-> LogicalContext (See https://github.com/python/cpython/blob/a6a4dc816d68df04a7d592e0b6af8c7ecc4d4... for the definition of the PyThreadState_GetDict) For most practical purposes as a *user* of thread locals, the involvement of PyThreadState and the state dict is a completely hidden implementation detail. However, every time you create a new thread, you're implicitly getting a new Python thread state, and hence a new thread state dict, and hence a new set of thread local values. Similarly, as a *user* of context variables, you'll generally be able to ignore the manipulation of the execution context going on behind the scenes - you'll just get, set, and delete individual context variables without worrying too much about exactly where and how they're stored. PEP 550 itself doesn't have that luxury, though, since in addition to defining how users will access and update these values, it *also* needs to define how the interpreter will implicitly manage the execution context for threads and generators and how event loops (including asyncio as the reference implementation) are going to be expected to manage the execution context explicitly when scheduling coroutines. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (19)
-
Antoine Pitrou
-
Barry Warsaw
-
David Mertz
-
Elvis Pranskevichus
-
Eric Snow
-
Ethan Furman
-
francismb
-
Glenn Linderman
-
Greg Ewing
-
Guido van Rossum
-
Ivan Levkivskyi
-
Kevin Conway
-
Koos Zevenhoven
-
Nathaniel Smith
-
Nick Coghlan
-
Stefan Behnel
-
Stefan Krah
-
Sven R. Kunze
-
Yury Selivanov