[Python-ideas] PEP 3156 feedback: wait_one vs par vs concurrent.futures.wait

Sat Dec 22 05:46:39 CET 2012

I figure python-ideas is still the best place for PEP 3156 feedback -
I think it's being revised too heavily for in-depth discussion on
python-dev to be a good idea, and I think spinning out a separate list
would lose too many people that are
interested-but-not-enough-to-subscribe-to-yet-another-mailing-list
(including me).

The current draft of the PEP suggests the use of par() for the barrier
operation (waiting for all futures and coroutines in a collection to
be ready), while tentatively suggesting wait_one() as the API for
waiting for the first completed operation in a collection. That
inconsistency is questionable all by itself, but there's a greater
stdlib level inconsistency that I find more concerning

The corresponding blocking API in concurrent.futures is the module
level "wait" function, which accepts a "return_when" parameter, with
the permitted values FIRST_COMPLETED, FIRST_EXCEPTION and
ALL_COMPLETED (the default). In the case where everything succeeds,
FIRST_EXCEPTION is the same as ALL_COMPLETED. This function also
accepts a timeout which allows the operation to finish early if the
operations take too long.

This flexibility also leads to a difference in the structure of the
return type: concurrent.futures.wait always returns a pair of sets,
with the first set being those futures which completed, while the
second contains those which remaining incomplete at the time the call
returned.

It seems to me that this "wait" API can be applied directly to the
equivalent problems in the async space, and, accordingly, *should* be
applied so that the synchronous and asynchronous APIs remain as
consistent as possible.

The low level equivalent to par() would be:

    incomplete = <tasks, futures or coroutines>
    complete, incomplete = yield from tulip.wait(incomplete)
    assert not incomplete # Without a timeout, everything should complete
    for f in complete:
        # Handle the completed operations

Limiting the maximum execution time of any task to 10 seconds is
straightforward:

    incomplete = <tasks, futures or coroutines>
    complete, incomplete = yield from tulip.wait(incomplete, timeout=10)
    for f in incomplete:
        f.cancel() # Took too long, kill it
    for f in complete:
        # Handle the completed operations

The low level equivalent to the wait_one() example would become:

    incomplete = <tasks, futures or coroutines>
    while incomplete:
        complete, incomplete = yield from tulip.wait(incomplete,
return_when=FIRST_COMPLETED)
        for f in complete:
            # Handle the completed operations

par() becomes easy to define as a coroutine:

    @coroutine
    def par(fs):
        complete, incomplete = yield from tulip.wait(fs,
return_when=FIRST_EXCEPTION)
        for f in incomplete:
            f.cancel() # Something must have failed, so cancel the rest
        # If something failed, calling f.result() will raise that exception
        return [f.result() for f in complete]

Defining wait_one() is also straightforward (although it isn't clearly
superior to just
using the underlying API directly):

    @coroutine
    def wait_one(fs):
        complete, incomplete = yield from tulip.wait(fs,
return_when=FIRST_COMPLETED)
        return complete.pop()

The async equivalent to "as_completed" under this scheme is far more
interesting, as it would be an iterator that produces coroutines:

    def as_completed(fs):
        incomplete = fs
        while incomplete:
            # Phase 1 of the loop, we yield a coroutine that actually
starts operations running
            @coroutine
            def _wait_for_some():
                nonlocal complete, incomplete
                complete, incomplete = yield from tulip.wait(fs,
return_when=FIRST_COMPLETED)
                return complete.pop().result()
            yield _wait_for_some()
            # Phase 2 of the loop, we pass back the already complete operations
            while complete:
                # Note this use case for @coroutine *forcing* objects
to behave like a generator,
                # as well as exploiting the ability to avoid trips
around the event loop
                @coroutine
                def _next_result():
                    return complete.pop().result()
                yield _next_result()

    # This is almost as easy to use as the synchronous equivalent, the
only difference
    # is the use of "yield from f" instead of the synchronous "f.result()"
    for f in as_completed(fs):
        next = yield from f

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia