PEP 3156 feedback: wait_one vs par vs concurrent.futures.wait

I figure python-ideas is still the best place for PEP 3156 feedback - I think it's being revised too heavily for in-depth discussion on python-dev to be a good idea, and I think spinning out a separate list would lose too many people that are interested-but-not-enough-to-subscribe-to-yet-another-mailing-list (including me). The current draft of the PEP suggests the use of par() for the barrier operation (waiting for all futures and coroutines in a collection to be ready), while tentatively suggesting wait_one() as the API for waiting for the first completed operation in a collection. That inconsistency is questionable all by itself, but there's a greater stdlib level inconsistency that I find more concerning The corresponding blocking API in concurrent.futures is the module level "wait" function, which accepts a "return_when" parameter, with the permitted values FIRST_COMPLETED, FIRST_EXCEPTION and ALL_COMPLETED (the default). In the case where everything succeeds, FIRST_EXCEPTION is the same as ALL_COMPLETED. This function also accepts a timeout which allows the operation to finish early if the operations take too long. This flexibility also leads to a difference in the structure of the return type: concurrent.futures.wait always returns a pair of sets, with the first set being those futures which completed, while the second contains those which remaining incomplete at the time the call returned. It seems to me that this "wait" API can be applied directly to the equivalent problems in the async space, and, accordingly, *should* be applied so that the synchronous and asynchronous APIs remain as consistent as possible. The low level equivalent to par() would be: incomplete = <tasks, futures or coroutines> complete, incomplete = yield from tulip.wait(incomplete) assert not incomplete # Without a timeout, everything should complete for f in complete: # Handle the completed operations Limiting the maximum execution time of any task to 10 seconds is straightforward: incomplete = <tasks, futures or coroutines> complete, incomplete = yield from tulip.wait(incomplete, timeout=10) for f in incomplete: f.cancel() # Took too long, kill it for f in complete: # Handle the completed operations The low level equivalent to the wait_one() example would become: incomplete = <tasks, futures or coroutines> while incomplete: complete, incomplete = yield from tulip.wait(incomplete, return_when=FIRST_COMPLETED) for f in complete: # Handle the completed operations par() becomes easy to define as a coroutine: @coroutine def par(fs): complete, incomplete = yield from tulip.wait(fs, return_when=FIRST_EXCEPTION) for f in incomplete: f.cancel() # Something must have failed, so cancel the rest # If something failed, calling f.result() will raise that exception return [f.result() for f in complete] Defining wait_one() is also straightforward (although it isn't clearly superior to just using the underlying API directly): @coroutine def wait_one(fs): complete, incomplete = yield from tulip.wait(fs, return_when=FIRST_COMPLETED) return complete.pop() The async equivalent to "as_completed" under this scheme is far more interesting, as it would be an iterator that produces coroutines: def as_completed(fs): incomplete = fs while incomplete: # Phase 1 of the loop, we yield a coroutine that actually starts operations running @coroutine def _wait_for_some(): nonlocal complete, incomplete complete, incomplete = yield from tulip.wait(fs, return_when=FIRST_COMPLETED) return complete.pop().result() yield _wait_for_some() # Phase 2 of the loop, we pass back the already complete operations while complete: # Note this use case for @coroutine *forcing* objects to behave like a generator, # as well as exploiting the ability to avoid trips around the event loop @coroutine def _next_result(): return complete.pop().result() yield _next_result() # This is almost as easy to use as the synchronous equivalent, the only difference # is the use of "yield from f" instead of the synchronous "f.result()" for f in as_completed(fs): next = yield from f Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 21, 2012 at 8:46 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
You've convinced me. I've never used the wait() and as_completed() APIs in c.f, but you're right that with the exception of requiring 'yield from' they can be carried over exactly, and given that we're doing the same thing with Future, this is eminently reasonable. I may not get to implementing these for two weeks (I'll be traveling without a computer) but they will not be forgotten. --Guido
-- --Guido van Rossum (python.org/~guido)

On Fri, Dec 21, 2012 at 9:17 PM, Guido van Rossum <guido@python.org> wrote:
I did update the PEP. There are some questions about details; e.g. I think the 'fs' argument should allow a mixture of Futures and coroutines (the latter will be wrapped Tasks) and the sets returned by wait() should contain Futures and Tasks. You propose that as_completed() returns an iterator whose items are coroutines; why not Futures? (They're more versatile even if slightly slower that coroutines.) I can sort of see the reasoning but want to tease out whether you meant it that way. Also, we can't have __next__() raise TimeoutError, since it never blocks; it will have to be the coroutine (or Future) returned by __next__().
-- --Guido van Rossum (python.org/~guido)

On Sat, Dec 22, 2012 at 4:20 PM, Guido van Rossum <guido@python.org> wrote:
Yes, I think I wrote my examples that way, even though I didn't say that in the text.
I deliberately chose to return coroutines. My rationale is to be able to handle the case where multiple operations become ready without having to make multiple trips around the event loop by having the iterator switch between two modes: when the complete set is empty, it yields a coroutine that calls wait and then returns the first complete future, while when there are already complete futures available, it yields a coroutine that just returns one of them immediately. It's really the same rationale as that for having @coroutine not automatically wrap things in Task - if we can avoid the event loop in cases that don't actually need to wait for an event, that's a good thing.
Yeah, any exceptions should happen at the yield from call inside the loop. I *think* my implementation achieves that (since the coroutine instances it creates are passed out to the for loop for further processing), but it's quite possible I missed something. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Dec 22, 2012 at 12:04 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Good.
I think I see it now. The first item yielded is the simplest thing that can be used with yield-from, i.e. a coroutine. Then if multiple futures are ready at once, you return an item of the same type, i.e. a coroutine. This is essentially wrapping a Future in a coroutine! If we could live with the items being alternatingly coroutines and Futures, we could just return the Future in this case. BTW, yield from <future> need not go to the scheduler if the Future is already done -- the Future,__iter__ method should be: def __iter__(self): if not self.done(): yield self # This tells Task to wait for completion. return self.result() # May raise too. (I forgot this previously.)
It'll come out in implementation (in two weeks, maybe). -- --Guido van Rossum (python.org/~guido)

On Sun, Dec 23, 2012 at 1:54 AM, Guido van Rossum <guido@python.org> wrote:
And I'd missed it completely :) In that case, yeah, yielding any already completed Futures directly from as_completed() should work. The "no completed operations" case will still need a coroutine, though, as it needs to update the "complete" and "incomplete" sets inside the iterator. Since we know we're certain to hit the scheduler in that case, we may as well wrap it directly in a task so we're always returning some kind of future. The impl might end up looking something like: def as_completed(fs): incomplete = fs while incomplete: # Phase 1 of the loop, we yield a Task that waits for operations @coroutine def _wait_for_some(): nonlocal complete, incomplete complete, incomplete = yield from tulip.wait(fs, return_when=FIRST_COMPLETED) return complete.pop().result() yield Task(_wait_for_some()) # Phase 2 of the loop, we pass back the already complete operations while complete: yield complete.pop() Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 21, 2012 at 8:46 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
You've convinced me. I've never used the wait() and as_completed() APIs in c.f, but you're right that with the exception of requiring 'yield from' they can be carried over exactly, and given that we're doing the same thing with Future, this is eminently reasonable. I may not get to implementing these for two weeks (I'll be traveling without a computer) but they will not be forgotten. --Guido
-- --Guido van Rossum (python.org/~guido)

On Fri, Dec 21, 2012 at 9:17 PM, Guido van Rossum <guido@python.org> wrote:
I did update the PEP. There are some questions about details; e.g. I think the 'fs' argument should allow a mixture of Futures and coroutines (the latter will be wrapped Tasks) and the sets returned by wait() should contain Futures and Tasks. You propose that as_completed() returns an iterator whose items are coroutines; why not Futures? (They're more versatile even if slightly slower that coroutines.) I can sort of see the reasoning but want to tease out whether you meant it that way. Also, we can't have __next__() raise TimeoutError, since it never blocks; it will have to be the coroutine (or Future) returned by __next__().
-- --Guido van Rossum (python.org/~guido)

On Sat, Dec 22, 2012 at 4:20 PM, Guido van Rossum <guido@python.org> wrote:
Yes, I think I wrote my examples that way, even though I didn't say that in the text.
I deliberately chose to return coroutines. My rationale is to be able to handle the case where multiple operations become ready without having to make multiple trips around the event loop by having the iterator switch between two modes: when the complete set is empty, it yields a coroutine that calls wait and then returns the first complete future, while when there are already complete futures available, it yields a coroutine that just returns one of them immediately. It's really the same rationale as that for having @coroutine not automatically wrap things in Task - if we can avoid the event loop in cases that don't actually need to wait for an event, that's a good thing.
Yeah, any exceptions should happen at the yield from call inside the loop. I *think* my implementation achieves that (since the coroutine instances it creates are passed out to the for loop for further processing), but it's quite possible I missed something. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Dec 22, 2012 at 12:04 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Good.
I think I see it now. The first item yielded is the simplest thing that can be used with yield-from, i.e. a coroutine. Then if multiple futures are ready at once, you return an item of the same type, i.e. a coroutine. This is essentially wrapping a Future in a coroutine! If we could live with the items being alternatingly coroutines and Futures, we could just return the Future in this case. BTW, yield from <future> need not go to the scheduler if the Future is already done -- the Future,__iter__ method should be: def __iter__(self): if not self.done(): yield self # This tells Task to wait for completion. return self.result() # May raise too. (I forgot this previously.)
It'll come out in implementation (in two weeks, maybe). -- --Guido van Rossum (python.org/~guido)

On Sun, Dec 23, 2012 at 1:54 AM, Guido van Rossum <guido@python.org> wrote:
And I'd missed it completely :) In that case, yeah, yielding any already completed Futures directly from as_completed() should work. The "no completed operations" case will still need a coroutine, though, as it needs to update the "complete" and "incomplete" sets inside the iterator. Since we know we're certain to hit the scheduler in that case, we may as well wrap it directly in a task so we're always returning some kind of future. The impl might end up looking something like: def as_completed(fs): incomplete = fs while incomplete: # Phase 1 of the loop, we yield a Task that waits for operations @coroutine def _wait_for_some(): nonlocal complete, incomplete complete, incomplete = yield from tulip.wait(fs, return_when=FIRST_COMPLETED) return complete.pop().result() yield Task(_wait_for_some()) # Phase 2 of the loop, we pass back the already complete operations while complete: yield complete.pop() Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (2)
-
Guido van Rossum
-
Nick Coghlan