[Python-ideas] The async API of the future: yield-from

Fri Oct 19 05:46:40 CEST 2012

Is the goal really to provide "The async API of the future", or just
to provide "a stdlib module which provides one adequate way to do
async"?

I think the yield and yield from solutions all need too much magical
scaffolding to be The One True Way, but I don't mind such conventions
as much when they're part of a particular example class, such as
concurrent.schedulers.YieldScheduler.

To stretch an analogy, generators and context managers are different
concepts.  Allowing certain generators to be used as context managers
(by using the "with" keyword)  is fine.  But I wouldn't want to give
up all the other uses of generators.

If yield starts implying other magical properties that are only useful
when communicating with a scheduler, rather than a regular caller ...
I'm afraid that muddies the concept up too much for my taste.

More specific concerns below:

On 10/12/12, Guido van Rossum <guido at python.org> wrote:

> But the only use for send() on a generator is when using it as a
> coroutine for a concurrent tasks system -- send() really makes no
> sense for generators used as iterators. And you're claiming, it seems,
> that you prefer yield-from for concurrent tasks.

But the data doesn't have to be scheduling information; it can be new
data, a seed for an algorithm, a command to switch or reset the state
... locking it to the scheduler is part of what worries me.

> On Thu, Oct 11, 2012 at 6:32 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>

>> Keep in mind that a value yielded by a generator being used as
>> part of a coroutine is *not* seen by code calling it with
>> yield-from.

That is part of what bugs me about the yield-from examples.

Until this discussion, I had thought of yield-from as factoring out
some code that was still conceptually embedded within the parent
generator.  This (perhaps correctly) makes it seem more like a
temporary replacement, as if the parent were no longer there at all.

But with the yield channel reserved for scheduling overhead, the
"generator" can't really generate anything, except through side
effects...

> ... I feel that "value = yield <something that returns a Future>"
> is quite a good paradigm,

To me, it seems fine for a particular concrete scheduler, but too
strong an assumption for an abstract API.

I can mostly* understand:

    YieldScheduler assumes any yielded data is another Task; it will
    schedule that task, and cause the original (yielding) Task to wait
    until the new task is completed.

But I wonder what I'm missing with:

    Generators should only yield (expressions that create) Futures;
    the scheduler will automatically unwrap the future and send (or
    throw) the result back into the parent (or other ancestor)
    Generator, which will then be resumed.

* "mostly", because if my task is willing to wait for the subtask to
complete, then why not just use a blocking call in the first place?
Is it just because switching to another task is lighter weight than
letting a thread block?

What happens if a generator does yield something other than a Future?
Will the generator be rescheduled in an already-runnable (as opposed
to waiting) state?  Will it never be resumed?  Will that object be
auto-wrapped in a Future for the benefit of whichever other co-routine
originally made the request?

Are generators assumed to run to exhaustion, or is some sort of driver
needed to keep pumping them?

> ... It would be horrible to require C to create a fake generator.

Would it have to wrap results in a fake Future, so that the scheduler
could properly unwrap?

> ...Well, I'm talking about a decorator that you *always* apply, and which
> does nothing (or very little) when wrapping a generator, but adds
> generator behavior when wrapping a non-generator function.

Why is an always-applied decorator any less intrusive than a mandatory
(mixin) base class?

> (1) Calling an async operation and waiting for its result, using yield

> Futures:
>   result = yield some_async_op(args)

I was repeatedly confused over whether "result" would be a Future that
still needed resolution, and the example code wasn't always
consistent.  As I understand it now, the scheduler (not just the
particular implementation, but the API) has to automatically treat any
yielded data as a future, resolve that future to its result, and then
send (or throw) that result (as opposed to the future) back into
either the parent task or the least distant ancestor task not to be
using "yield from".

> Yield-from:
>   result = yield from some_async_op(args)

So the generator containing this code suspends itself entirely until
some_async_op is exhausted, at which point result will be the
StopIteration?  (Or None?)  Non-Exception results get passed straight
to the least-distant ancestor task not using "yield from", but
Exceptions propagate through one generation at a time.

> (2) Setting the result of an async operation

> Futures:
>   f.set_result(value)  # From any callback

PEP 3148 considers set_result private to the executor.  Can that
always be done from arbitrary callbacks?  Can it be done more than
once?

I think for the normal case, a task should just return its value, and
the Future or the Scheduler should be responsible for calling
set_result.

> Yield-from:
>   return value  # From the outermost generator

Why only the outermost?  I'm guessing it is because everything else is
suspended, and even if a mid-level generator is explicitly re-added to
the task queue, it can't actually continue because of re-entrancy.

> (3) Handling an exception
>
> Futures:
>   try:
>     result = yield some_async_op(args)
>   except MyException:
>     <handle exception>

So the scheduler does have to unpack the future, and throw rather than send.

> (4) Raising an exception as the outcome of an async operation

> Futures:
>   f.set_exception(<Exception instance>)

Again, shouldn't the task itself just raise, and let the future (or
the scheduler) call that?

> Yield-from:
>   raise <Exception instance or class>  # From any of the generators

So it doesn't need to be wrapped in a Future, until it needs to cross
back over a "schedule this  asynchronously" gulf?

> (5) Having one async operation invoke another async operation

> Futures:
>   @task
>   def outer(args):
>     res = yield inner(args)
>     return res

> Yield-from:
>   def outer(args):
>     res = yield from inner(args)
>     return res

Will it ever get to continue processing (under either model) before
inner exhausts itself and stops yielding?

> Note: I'm including this because in the Futures case, each level of
> yield requires the creation of a separate Future.

Only because of the auto-unboxing.  And if the generator suspends
itself to wait for the future, then the future will be resolved before
control returns to the generator's own parents, so those per-layer
Futures won't really add anything.

> (6) Spawning off multiple async subtasks
>
> Futures:
>   f1 = subtask1(args1)  # Note: no yield!!!
>   f2 = subtask2(args2)
>   res1, res2 = yield f1, f2

ah.  That makes a bit more sense, though the tuple of futures does
complicate the automagic unboxing.  (Which containers, to which
levels, have to be resolved?)

> Yield-from:
>   ??????????
>
> *** Greg, can you come up with a good idiom to spell concurrency at
> this level? Your example only has concurrency in the philosophers
> example, but it appears to interact directly with the scheduler, and
> the philosophers don't return values. ***

Why wouldn't this be the same as you already wrote without yield-from?
Two subtasks were submitted but not waited for.  I suppose you could
yield from a generator that submits new subtasks every time it
generates something, but that would be solving a more complicated
problem.  (So it wouldn't be a consequence of the "yield from".)

> (7) Checking whether an operation is already complete

> Futures:
>   if f.done(): ...

If f was yielded, it is done, or this code wouldn't be running again to check.

> Yield-from:
>   ?????????????

And again, if the futures were yielded (even through a yield from)
then they're already unboxed; otherwise, you can still check f.done

> (8) Getting the result of an operation multiple times
>
> Futures:
>
>   f = async_op(args)
>   # squirrel away a reference to f somewhere else
>   r = yield f
>   # ... later, elsewhere
>   r = f.result()

Why do you have to squirrel away the reference?  Are you assuming that
the async scheduler will mess with the locals so that f is no longer
valid?

> Yield-from:
>   ???????????????

This, you cannot reasonably do; the nature of yield-from means that
the unresolved futures were never visible within this generator; they
were resolved by the scheduler and the results handed straight to the
generator's ancestor.

> (9) Canceling an operation
>
> Futures:
>   f.cancel()
>
> Yield-from:
>   ???????????????
>
> Note: I haven't needed canceling yet, and I believe Devin said that
> Twisted just got rid of it. However some of the JS Deferred
> implementations seem to support it.

I think that once you've called "yield from", the generator making
that call is suspended until the child generator completes.   But a
different thread of control could cancel the active (most-descended)
generator.

> (10) Registering additional callbacks
>
> Futures:
>   f.add_done_callback(callback)
>
> Yield-from:
>   ???????
>
> Note: this is used in NDB to trigger "hooks" that should run e.g. when
> a database write completes. The user's code just writes yield
> ent.put_async(); the trigger is automatically called by the Future's
> machinery. This also uses (8).

I think you would have to do add the callbacks within the subgenerator
that is spawning f.

That, or un-inline the yield from, and lose the automated send-throw forwarding.

-jJ