[Python-Dev] PEP 550 v4

Yury Selivanov yselivanov.ml at gmail.com
Sat Aug 26 13:15:06 EDT 2017


On Sat, Aug 26, 2017 at 2:34 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, Aug 25, 2017 at 3:32 PM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
>> Coroutines and Asynchronous Tasks
>> ---------------------------------
>>
>> In coroutines, like in generators, context variable changes are local
>> and are not visible to the caller::
>>
>>     import asyncio
>>
>>     var = new_context_var()
>>
>>     async def sub():
>>         assert var.lookup() == 'main'
>>         var.set('sub')
>>         assert var.lookup() == 'sub'
>>
>>     async def main():
>>         var.set('main')
>>         await sub()
>>         assert var.lookup() == 'main'
>>
>>     loop = asyncio.get_event_loop()
>>     loop.run_until_complete(main())
>
> I think this change is a bad idea. I think that generally, an async
> call like 'await async_sub()' should have the equivalent semantics to
> a synchronous call like 'sync_sub()', except for the part where the
> former is able to contain yields.

That exception is why the semantics cannot be equivalent.

> Giving every coroutine an LC breaks
> that equivalence. It also makes it so in async code, you can't
> necessarily refactor by moving code in and out of subroutines.

I'll cover the refactoring argument later in this email.

[..]
> It also adds non-trivial overhead, because now lookup() is O(depth of
> async callstack), instead of O(depth of (async) generator nesting),
> which is generally much smaller.

I don't think it's non-trivial though:

First, we have a cache in ContextVar which makes lookup O(1) for any
tight code that uses libraries like decimal and numpy.

Second, most of the LCs in the chain will be empty, so even the
uncached lookup will still be fast.

Third, you will usually have your "with my_context()" block right
around your code (or within a few awaits distance), otherwise it will
be hard to reason what's the context.  And if, occasionally, you have
a one single "var.lookup()" call that won't be cached, the cost of it
will still be measured in microseconds.

Finally, the easy to follow semantics is the main argument for the
change (even at the cost of making "get()" a bit slower in corner
cases).

>
> I think I see the motivation: you want to make
>
>    await sub()
>
> and
>
>    await ensure_future(sub())
>
> have the same semantics, right?

Yes.

> And the latter has to create a Task
> and split it off into a new execution context, so you want the former
> to do so as well? But to me this is like saying that we want
>
>    sync_sub()
>
> and
>
>    thread_pool_executor.submit(sync_sub).result()

This example is very similar to:

    await sub()

and

    await create_task(sub())

So it's really about making the semantics for coroutines be predictable.

> (And fwiw I'm still not convinced we should give up on 'yield from' as
> a mechanism for refactoring generators.)

I don't get this "refactoring generators" and "refactoring coroutines" argument.

Suppose you have this code:

  def gen():
     i = 0
     for _ in range(3):
         i += 1
         yield i
     for _ in range(5):
         i += 1
         yield i

You can't refactor gen() by simply copying/pasting parts of its body
into a separate generator:

  def count3():
     for _ in range(3):
         i += 1
         yield

  def gen():
     i = 0

     yield from count3()

     for _ in range(5):
         i += 1
         yield i

The above won't work for some obvious reasons: 'i' is a nonlocal
variable for 'count3' block of code. Almost exactly the same thing
will happen with the current PEP 550 specification, which is a *good*
thing.

'yield from' and 'await' are not about refactoring. They can be used
for splitting large generators/coroutines into a set of smaller ones,
sure. But there's *no* magical, always working, refactoring mechanism
that allows to do that blindly.

>
>> To establish the full semantics of execution context in couroutines,
>> we must also consider *tasks*.  A task is the abstraction used by
>> *asyncio*, and other similar libraries, to manage the concurrent
>> execution of coroutines.  In the example above, a task is created
>> implicitly by the ``run_until_complete()`` function.
>> ``asyncio.wait_for()`` is another example of implicit task creation::
>>
>>     async def sub():
>>         await asyncio.sleep(1)
>>         assert var.lookup() == 'main'
>>
>>     async def main():
>>         var.set('main')
>>
>>         # waiting for sub() directly
>>         await sub()
>>
>>         # waiting for sub() with a timeout
>>         await asyncio.wait_for(sub(), timeout=2)
>>
>>         var.set('main changed')
>>
>> Intuitively, we expect the assertion in ``sub()`` to hold true in both
>> invocations, even though the ``wait_for()`` implementation actually
>> spawns a task, which runs ``sub()`` concurrently with ``main()``.
>
> I found this example confusing -- you talk about sub() and main()
> running concurrently, but ``wait_for`` blocks main() until sub() has
> finished running, right?

Right.  Before we continue, let me make sure we are on the same page here:

    await asyncio.wait_for(sub(), timeout=2)

can be refactored into:

    task = asyncio.wait_for(sub(), timeout=2)
    # sub() is scheduled now, and a "loop.call_soon" call has been
    # made to advance it soon.
    await task

Now, if we look at the following example (1):

    async def foo():
         await bar()

The "bar()" coroutine will execute within "foo()".

If we add a timeout logic (2):

    async def foo():
         await wait_for(bar() ,1)

The "bar()" coroutine will execute outside of "foo()", and "foo()"
will only wait for the result of that execution.

Now, Async Tasks capture the context when they are created -- that's
the only sane option they have.

If coroutines don't have their own LC, "bar()" in examples (1) and (2)
would interact with the execution context differently!

And this is something that we can't let happen, as it would force
asyncio users to think about the EC every time they want to wrap a
coroutine into a task.

[..]
>> The ``sys.run_with_logical_context()`` function performs the following
>> steps:
>>
>> 1. Push *lc* onto the current execution context stack.
>> 2. Run ``func(*args, **kwargs)``.
>> 3. Pop *lc* from the execution context stack.
>> 4. Return or raise the ``func()`` result.
>
> It occurs to me that both this and the way generator/coroutines expose
> their logic context means that logical context objects are
> semantically mutable. This could create weird effects if someone
> attaches the same LC to two different generators, or tries to use it
> simultaneously in two different threads, etc. We should have a little
> interlock like generator's ag_running, where an LC keeps track of
> whether it's currently in use and if you try to push the same LC onto
> two ECs simultaneously then it errors out.

Correct.  Both LC (and EC) objects will be both wrapped into "shell"
objects before being exposed to the end user.
run_with_logical_context() will mutate the user-visible LC object
(keeping the underlying LC immutable, of course).

Ideally, we would want run_with_logical_context to have the following signature:

    result, updated_lc = run_with_logical_context(lc, callable)

But because "callable" can raise an exception this would not work.

>
>> For efficient access in performance-sensitive code paths, such as in
>> ``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``,
>> making it an O(1) operation when the cache is hit.  The cache key is
>> composed from the following:
>>
>> * The new ``uint64_t PyThreadState->unique_id``, which is a globally
>>   unique thread state identifier.  It is computed from the new
>>   ``uint64_t PyInterpreterState->ts_counter``, which is incremented
>>   whenever a new thread state is created.
>>
>> * The ``uint64_t ContextVar->version`` counter, which is incremented
>>   whenever the context variable value is changed in any logical context
>>   in any thread.
>
> I'm pretty sure you need to also invalidate on context push/pop. Consider:
>
> def gen():
>     var.set("gen")
>     var.lookup()  # cache now holds "gen"
>     yield
>     print(var.lookup())
>
> def main():
>     var.set("main")
>     g = gen()
>     next(g)
>     # This should print "main", but it's the same thread and the last
> call to set() was
>     # the one inside gen(), so we get the cached "gen" instead
>     print(var.lookup())
>     var.set("no really main")
>     var.lookup()  # cache now holds "no really main"
>     next(g)  # should print "gen" but instead prints "no really main"

Yeah, you're right. Thanks!

>
>> The cache is then implemented as follows::
>>
>>     class ContextVar:
>>
>>         def set(self, value):
>>             ...  # implementation
>>             self.version += 1
>>
>>
>>         def get(self):
>
> I think you missed a s/get/lookup/ here :-)

Fixed!

Yury


More information about the Python-Dev mailing list