
Is the goal really to provide "The async API of the future", or just to provide "a stdlib module which provides one adequate way to do async"?
I think the yield and yield from solutions all need too much magical scaffolding to be The One True Way, but I don't mind such conventions as much when they're part of a particular example class, such as concurrent.schedulers.YieldScheduler.
To stretch an analogy, generators and context managers are different concepts. Allowing certain generators to be used as context managers (by using the "with" keyword) is fine. But I wouldn't want to give up all the other uses of generators.
If yield starts implying other magical properties that are only useful when communicating with a scheduler, rather than a regular caller ... I'm afraid that muddies the concept up too much for my taste.
More specific concerns below:
On 10/12/12, Guido van Rossum guido@python.org wrote:
But the only use for send() on a generator is when using it as a coroutine for a concurrent tasks system -- send() really makes no sense for generators used as iterators. And you're claiming, it seems, that you prefer yield-from for concurrent tasks.
But the data doesn't have to be scheduling information; it can be new data, a seed for an algorithm, a command to switch or reset the state ... locking it to the scheduler is part of what worries me.
On Thu, Oct 11, 2012 at 6:32 PM, Greg Ewing greg.ewing@canterbury.ac.nz
Keep in mind that a value yielded by a generator being used as part of a coroutine is *not* seen by code calling it with yield-from.
That is part of what bugs me about the yield-from examples.
Until this discussion, I had thought of yield-from as factoring out some code that was still conceptually embedded within the parent generator. This (perhaps correctly) makes it seem more like a temporary replacement, as if the parent were no longer there at all.
But with the yield channel reserved for scheduling overhead, the "generator" can't really generate anything, except through side effects...
... I feel that "value = yield <something that returns a Future>" is quite a good paradigm,
To me, it seems fine for a particular concrete scheduler, but too strong an assumption for an abstract API.
I can mostly* understand:
YieldScheduler assumes any yielded data is another Task; it will schedule that task, and cause the original (yielding) Task to wait until the new task is completed.
But I wonder what I'm missing with:
Generators should only yield (expressions that create) Futures; the scheduler will automatically unwrap the future and send (or throw) the result back into the parent (or other ancestor) Generator, which will then be resumed.
* "mostly", because if my task is willing to wait for the subtask to complete, then why not just use a blocking call in the first place? Is it just because switching to another task is lighter weight than letting a thread block?
What happens if a generator does yield something other than a Future? Will the generator be rescheduled in an already-runnable (as opposed to waiting) state? Will it never be resumed? Will that object be auto-wrapped in a Future for the benefit of whichever other co-routine originally made the request?
Are generators assumed to run to exhaustion, or is some sort of driver needed to keep pumping them?
... It would be horrible to require C to create a fake generator.
Would it have to wrap results in a fake Future, so that the scheduler could properly unwrap?
...Well, I'm talking about a decorator that you *always* apply, and which does nothing (or very little) when wrapping a generator, but adds generator behavior when wrapping a non-generator function.
Why is an always-applied decorator any less intrusive than a mandatory (mixin) base class?
(1) Calling an async operation and waiting for its result, using yield
Futures: result = yield some_async_op(args)
I was repeatedly confused over whether "result" would be a Future that still needed resolution, and the example code wasn't always consistent. As I understand it now, the scheduler (not just the particular implementation, but the API) has to automatically treat any yielded data as a future, resolve that future to its result, and then send (or throw) that result (as opposed to the future) back into either the parent task or the least distant ancestor task not to be using "yield from".
Yield-from: result = yield from some_async_op(args)
So the generator containing this code suspends itself entirely until some_async_op is exhausted, at which point result will be the StopIteration? (Or None?) Non-Exception results get passed straight to the least-distant ancestor task not using "yield from", but Exceptions propagate through one generation at a time.
(2) Setting the result of an async operation
Futures: f.set_result(value) # From any callback
PEP 3148 considers set_result private to the executor. Can that always be done from arbitrary callbacks? Can it be done more than once?
I think for the normal case, a task should just return its value, and the Future or the Scheduler should be responsible for calling set_result.
Yield-from: return value # From the outermost generator
Why only the outermost? I'm guessing it is because everything else is suspended, and even if a mid-level generator is explicitly re-added to the task queue, it can't actually continue because of re-entrancy.
(3) Handling an exception
Futures: try: result = yield some_async_op(args) except MyException: <handle exception>
So the scheduler does have to unpack the future, and throw rather than send.
(4) Raising an exception as the outcome of an async operation
Futures: f.set_exception(<Exception instance>)
Again, shouldn't the task itself just raise, and let the future (or the scheduler) call that?
Yield-from: raise <Exception instance or class> # From any of the generators
So it doesn't need to be wrapped in a Future, until it needs to cross back over a "schedule this asynchronously" gulf?
(5) Having one async operation invoke another async operation
Futures: @task def outer(args): res = yield inner(args) return res
Yield-from: def outer(args): res = yield from inner(args) return res
Will it ever get to continue processing (under either model) before inner exhausts itself and stops yielding?
Note: I'm including this because in the Futures case, each level of yield requires the creation of a separate Future.
Only because of the auto-unboxing. And if the generator suspends itself to wait for the future, then the future will be resolved before control returns to the generator's own parents, so those per-layer Futures won't really add anything.
(6) Spawning off multiple async subtasks
Futures: f1 = subtask1(args1) # Note: no yield!!! f2 = subtask2(args2) res1, res2 = yield f1, f2
ah. That makes a bit more sense, though the tuple of futures does complicate the automagic unboxing. (Which containers, to which levels, have to be resolved?)
Yield-from: ??????????
*** Greg, can you come up with a good idiom to spell concurrency at this level? Your example only has concurrency in the philosophers example, but it appears to interact directly with the scheduler, and the philosophers don't return values. ***
Why wouldn't this be the same as you already wrote without yield-from? Two subtasks were submitted but not waited for. I suppose you could yield from a generator that submits new subtasks every time it generates something, but that would be solving a more complicated problem. (So it wouldn't be a consequence of the "yield from".)
(7) Checking whether an operation is already complete
Futures: if f.done(): ...
If f was yielded, it is done, or this code wouldn't be running again to check.
Yield-from: ?????????????
And again, if the futures were yielded (even through a yield from) then they're already unboxed; otherwise, you can still check f.done
(8) Getting the result of an operation multiple times
Futures:
f = async_op(args) # squirrel away a reference to f somewhere else r = yield f # ... later, elsewhere r = f.result()
Why do you have to squirrel away the reference? Are you assuming that the async scheduler will mess with the locals so that f is no longer valid?
Yield-from: ???????????????
This, you cannot reasonably do; the nature of yield-from means that the unresolved futures were never visible within this generator; they were resolved by the scheduler and the results handed straight to the generator's ancestor.
(9) Canceling an operation
Futures: f.cancel()
Yield-from: ???????????????
Note: I haven't needed canceling yet, and I believe Devin said that Twisted just got rid of it. However some of the JS Deferred implementations seem to support it.
I think that once you've called "yield from", the generator making that call is suspended until the child generator completes. But a different thread of control could cancel the active (most-descended) generator.
(10) Registering additional callbacks
Futures: f.add_done_callback(callback)
Yield-from: ???????
Note: this is used in NDB to trigger "hooks" that should run e.g. when a database write completes. The user's code just writes yield ent.put_async(); the trigger is automatically called by the Future's machinery. This also uses (8).
I think you would have to do add the callbacks within the subgenerator that is spawning f.
That, or un-inline the yield from, and lose the automated send-throw forwarding.
-jJ