[Python-ideas] The async API of the future: yield-from

Fri Oct 19 14:46:31 CEST 2012

On Thu, Oct 18, 2012 at 11:46 PM, Jim Jewett <jimjjewett at gmail.com> wrote:
> Is the goal really to provide "The async API of the future", or just
> to provide "a stdlib module which provides one adequate way to do
> async"?
>
> I think the yield and yield from solutions all need too much magical
> scaffolding to be The One True Way, but I don't mind such conventions
> as much when they're part of a particular example class, such as
> concurrent.schedulers.YieldScheduler.
>
> To stretch an analogy, generators and context managers are different
> concepts.  Allowing certain generators to be used as context managers
> (by using the "with" keyword)  is fine.  But I wouldn't want to give
> up all the other uses of generators.
>
> If yield starts implying other magical properties that are only useful
> when communicating with a scheduler, rather than a regular caller ...
> I'm afraid that muddies the concept up too much for my taste.

I think it is important that this is more than convention. I think that we
need our old friend TOOOWTDI (There's Only One Obvious Way To Do It)
here more than ever. This stuff is complicated, and following that
interoperability
of what eventually is written on top of it is going to be complicated. Our
focus should be not on providing simple things like "async file read" but
crafting an environment where people can continue to write wonderfully
expressive and useful libraries that others can combine to their own needs.
If we don't provide the layer upon which this disparate pieces cooperate,
I fear much of the effort is all for too little gain to be worth the effort.

> More specific concerns below:
>
> On 10/12/12, Guido van Rossum <guido at python.org> wrote:
>
>> But the only use for send() on a generator is when using it as a
>> coroutine for a concurrent tasks system -- send() really makes no
>> sense for generators used as iterators. And you're claiming, it seems,
>> that you prefer yield-from for concurrent tasks.
>
> But the data doesn't have to be scheduling information; it can be new
> data, a seed for an algorithm, a command to switch or reset the state
> ... locking it to the scheduler is part of what worries me.

When a coroutine yields, it yields *to the scheduler* so for whom else should
these values be?

>> On Thu, Oct 11, 2012 at 6:32 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>
>
>>> Keep in mind that a value yielded by a generator being used as
>>> part of a coroutine is *not* seen by code calling it with
>>> yield-from.
>
> That is part of what bugs me about the yield-from examples.
>
> Until this discussion, I had thought of yield-from as factoring out
> some code that was still conceptually embedded within the parent
> generator.  This (perhaps correctly) makes it seem more like a
> temporary replacement, as if the parent were no longer there at all.
>
> But with the yield channel reserved for scheduling overhead, the
> "generator" can't really generate anything, except through side
> effects...

Don't forget that yield-from is an expression, not a statement. The
value eventually returned from the generator is the result of the yield-from,
so the generator still produces a final value.

The fact that these are generators is for their ability to suspend, not to
iterate.

>> ... I feel that "value = yield <something that returns a Future>"
>> is quite a good paradigm,
>
> To me, it seems fine for a particular concrete scheduler, but too
> strong an assumption for an abstract API.
>
> I can mostly* understand:
>
>     YieldScheduler assumes any yielded data is another Task; it will
>     schedule that task, and cause the original (yielding) Task to wait
>     until the new task is completed.
>
> But I wonder what I'm missing with:
>
>     Generators should only yield (expressions that create) Futures;
>     the scheduler will automatically unwrap the future and send (or
>     throw) the result back into the parent (or other ancestor)
>     Generator, which will then be resumed.
>
> * "mostly", because if my task is willing to wait for the subtask to
> complete, then why not just use a blocking call in the first place?
> Is it just because switching to another task is lighter weight than
> letting a thread block?

By blocking call do you mean "x = foo()" or "x = yield from foo()"?
Blocking call usually means the former, so if you mean that, then you neglect
to think of all the other tasks running which are not willing to wait.

> What happens if a generator does yield something other than a Future?
> Will the generator be rescheduled in an already-runnable (as opposed
> to waiting) state?  Will it never be resumed?  Will that object be
> auto-wrapped in a Future for the benefit of whichever other co-routine
> originally made the request?

I think if the scheduler doesn't know what to do with something, it should be an
error. That makes it easier to change things in the future.

> Are generators assumed to run to exhaustion, or is some sort of driver
> needed to keep pumping them?
>
>
>> ... It would be horrible to require C to create a fake generator.
>
> Would it have to wrap results in a fake Future, so that the scheduler
> could properly unwrap?
>
>> ...Well, I'm talking about a decorator that you *always* apply, and which
>> does nothing (or very little) when wrapping a generator, but adds
>> generator behavior when wrapping a non-generator function.
>
> Why is an always-applied decorator any less intrusive than a mandatory
> (mixin) base class?
>
>> (1) Calling an async operation and waiting for its result, using yield
>
>> Futures:
>>   result = yield some_async_op(args)
>
> I was repeatedly confused over whether "result" would be a Future that
> still needed resolution, and the example code wasn't always
> consistent.  As I understand it now, the scheduler (not just the
> particular implementation, but the API) has to automatically treat any
> yielded data as a future, resolve that future to its result, and then
> send (or throw) that result (as opposed to the future) back into
> either the parent task or the least distant ancestor task not to be
> using "yield from".
>
>
>> Yield-from:
>>   result = yield from some_async_op(args)
>
> So the generator containing this code suspends itself entirely until
> some_async_op is exhausted, at which point result will be the
> StopIteration?  (Or None?)  Non-Exception results get passed straight
> to the least-distant ancestor task not using "yield from", but
> Exceptions propagate through one generation at a time.

The result is not an exception, but the return of some_async_op(args)

>> (2) Setting the result of an async operation
>
>> Futures:
>>   f.set_result(value)  # From any callback
>
> PEP 3148 considers set_result private to the executor.  Can that
> always be done from arbitrary callbacks?  Can it be done more than
> once?
>
> I think for the normal case, a task should just return its value, and
> the Future or the Scheduler should be responsible for calling
> set_result.

I agree

>> Yield-from:
>>   return value  # From the outermost generator
>
> Why only the outermost?  I'm guessing it is because everything else is
> suspended, and even if a mid-level generator is explicitly re-added to
> the task queue, it can't actually continue because of re-entrancy.
>
>
>> (3) Handling an exception
>>
>> Futures:
>>   try:
>>     result = yield some_async_op(args)
>>   except MyException:
>>     <handle exception>
>
> So the scheduler does have to unpack the future, and throw rather than send.
>
>> (4) Raising an exception as the outcome of an async operation
>
>> Futures:
>>   f.set_exception(<Exception instance>)
>
> Again, shouldn't the task itself just raise, and let the future (or
> the scheduler) call that?
>
>> Yield-from:
>>   raise <Exception instance or class>  # From any of the generators
>
> So it doesn't need to be wrapped in a Future, until it needs to cross
> back over a "schedule this  asynchronously" gulf?
>
>> (5) Having one async operation invoke another async operation
>
>> Futures:
>>   @task
>>   def outer(args):
>>     res = yield inner(args)
>>     return res
>
>> Yield-from:
>>   def outer(args):
>>     res = yield from inner(args)
>>     return res
>
> Will it ever get to continue processing (under either model) before
> inner exhausts itself and stops yielding?
>
>> Note: I'm including this because in the Futures case, each level of
>> yield requires the creation of a separate Future.
>
> Only because of the auto-unboxing.  And if the generator suspends
> itself to wait for the future, then the future will be resolved before
> control returns to the generator's own parents, so those per-layer
> Futures won't really add anything.
>
>> (6) Spawning off multiple async subtasks
>>
>> Futures:
>>   f1 = subtask1(args1)  # Note: no yield!!!
>>   f2 = subtask2(args2)
>>   res1, res2 = yield f1, f2
>
> ah.  That makes a bit more sense, though the tuple of futures does
> complicate the automagic unboxing.  (Which containers, to which
> levels, have to be resolved?)
>
>> Yield-from:
>>   ??????????
>>
>> *** Greg, can you come up with a good idiom to spell concurrency at
>> this level? Your example only has concurrency in the philosophers
>> example, but it appears to interact directly with the scheduler, and
>> the philosophers don't return values. ***
>
> Why wouldn't this be the same as you already wrote without yield-from?
> Two subtasks were submitted but not waited for.  I suppose you could
> yield from a generator that submits new subtasks every time it
> generates something, but that would be solving a more complicated
> problem.  (So it wouldn't be a consequence of the "yield from".)
>
>
>
>> (7) Checking whether an operation is already complete
>
>> Futures:
>>   if f.done(): ...
>
> If f was yielded, it is done, or this code wouldn't be running again to check.
>
>> Yield-from:
>>   ?????????????
>
> And again, if the futures were yielded (even through a yield from)
> then they're already unboxed; otherwise, you can still check f.done
>
>> (8) Getting the result of an operation multiple times
>>
>> Futures:
>>
>>   f = async_op(args)
>>   # squirrel away a reference to f somewhere else
>>   r = yield f
>>   # ... later, elsewhere
>>   r = f.result()
>
> Why do you have to squirrel away the reference?  Are you assuming that
> the async scheduler will mess with the locals so that f is no longer
> valid?
>
>> Yield-from:
>>   ???????????????
>
> This, you cannot reasonably do; the nature of yield-from means that
> the unresolved futures were never visible within this generator; they
> were resolved by the scheduler and the results handed straight to the
> generator's ancestor.
>
>> (9) Canceling an operation
>>
>> Futures:
>>   f.cancel()
>>
>> Yield-from:
>>   ???????????????
>>
>> Note: I haven't needed canceling yet, and I believe Devin said that
>> Twisted just got rid of it. However some of the JS Deferred
>> implementations seem to support it.
>
> I think that once you've called "yield from", the generator making
> that call is suspended until the child generator completes.   But a
> different thread of control could cancel the active (most-descended)
> generator.
>
>> (10) Registering additional callbacks
>>
>> Futures:
>>   f.add_done_callback(callback)
>>
>> Yield-from:
>>   ???????
>>
>> Note: this is used in NDB to trigger "hooks" that should run e.g. when
>> a database write completes. The user's code just writes yield
>> ent.put_async(); the trigger is automatically called by the Future's
>> machinery. This also uses (8).
>
> I think you would have to do add the callbacks within the subgenerator
> that is spawning f.
>
> That, or un-inline the yield from, and lose the automated send-throw forwarding.
>
> -jJ
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-- 
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://techblog.ironfroggy.com/
Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy