[Python-ideas] Async API: some code to review

Wed Oct 31 22:18:28 CET 2012

On Wed, Oct 31, 2012 at 8:51 AM, Steve Dower <Steve.Dower at microsoft.com> wrote:
> Guido van Rossum wrote:
>> This is also one of the advantages of yield-from; you *never* go back to the end
>> of the ready queue just to invoke another layer of abstraction. (Steve tries to
>> approximate this by running the generator immediately until the first yield, but
>> the caller still ends up suspending to the scheduler, because they are using
>> yield which doesn't avoid the suspension, unlike yield-from.)
>
> This is easily changed by modifying lines 141 and 180 of scheduler.py to call _step() directly instead of requeuing it. The reason why it currently requeues the task is that there is no guarantee that the caller wanted the next step to occur in the same scheduler, whether because the completed operation or a previous one continued somewhere else. (I removed the option to attach this information to the Future itself, but it is certainly of value in some circumstances, though mostly involving threads and not necessarily sockets.)

I think you are missing the point. Even if you don't make a roundtrip
through the queue, *each* yield statement, if it is executed at all,
must transfers control to the scheduler. What you're proposing is just
making the scheduler immediately resume the generator.

So, if you have a trivial task, like this:

@async
def trivial(x):
    return x
    yield  # Unreachable, but makes it a generator

and a caller:

@async
caller():
    foo = yield trivial(42)
    print(foo)

then the call to trivial(42) returns a Future that already has the
result 42 set in it. But caller() still suspends to the scheduler,
yielding that Future. The scheduler can resume caller() immediately
but the damage (overhead) is done.

In contrast, in the yield-from world, we'd write this

def trivial(x):
    return x
    yield from ()  # Unreachable

def caller():
    foo = yield from trivial(42)
    print(foo)

where the latter expands roughly to the following, without reference
to the scheduler at all:

def caller():
    _gen = trivial(42)
    try:
        while True:
            _val = next(_gen)
            yield _val
    except StopIteration as _exc:
        foo = _exc.value
    print(foo)

The first next(gen) call raises StopIteration so the yield is never
reached -- the scheduler doesn't know that any of this is going in.
And there's no need to do anything special to advance the generator to
the first yield manually either.

(It's different of course when a generator is wrapped in a Task()
constructor. But that should be relatively rare.)

> The change I would probably make here is to test self.target and only requeue if it is different to the current scheduler (alternatively, a scheduler could implement its submit() to do this). Yes, this adds a little more overhead, but I'm still convinced that in general the operations being blocked on will take long enough for it to be insignificant. (And of course using a mechanism to bypass the decorator and use 'yield from' also avoids this overhead, though it potentially changes the program's behaviour).

Just get with the program and use yield-from exclusively.

-- 
--Guido van Rossum (python.org/~guido)