[Python-ideas] The async API of the future: yield-from

Mon Oct 15 00:55:46 CEST 2012

On Sun, Oct 14, 2012 at 3:27 PM, Guido van Rossum <guido at python.org> wrote:
> On Sun, Oct 14, 2012 at 3:09 PM, Ben Darnell <ben at bendarnell.com> wrote:
>> On Sun, Oct 14, 2012 at 7:36 AM, Guido van Rossum <guido at python.org> wrote:
>>>> So it would look something like
>>>>
>>>> Yield-from:
>>>>    task1 = subtask1(args1)
>>>>    task2 = subtask2(args2)
>>>>    res1, res2 = yield from par(task1, task2)
>>>>
>>>> where the implementation of par() is left as an exercise for
>>>> the reader.
>>>
>>> So, can par() be as simple as
>>>
>>> def par(*args):
>>>   results = []
>>>   for task in args:
>>>     result = yield from task
>>>     results.append(result)
>>>   return results
>>>
>>> ???
>>>
>>> Or does it need to interact with the scheduler to ensure fairness?
>>> (Not having built one of these, my intuition for how the primitives
>>> fit together is still lacking, so excuse me for asking naive
>>> questions.)
>>
>> It's not just fairness, it needs to interact with the scheduler to get
>> any parallelism at all if the sub-generators have more than one step.
>> Consider:
>>
>> def task1():
>>   print "1A"
>>   yield
>>   print "1B"
>>   yield
>>   print "1C"
>>   # and so on...
>>
>> def task2():
>>   print "2A"
>>   yield
>>   print "2B"
>>   yield
>>   print "2C"
>>
>> def outer():
>>   yield from par(task1(), task2())
>
> Hm, that's a little unrealistic -- in practice you'll rarely see code
> that yields unless it is also blocking for I/O. I presume that if both
> tasks immediately block for I/O, the one whose I/O completes first
> gets the run next; and if it then blocks again, it'll again depend on
> whose I/O finishes first.
>
> (Admittedly this has little to do with fairness now.)
>
>> Both tasks are started immediately, but can't progress further until
>> they are yielded from to advance the iterator.  So with this version
>> of par() you get 1A, 2A, 1B, 1C..., 2B, 2C.
>
> Really? When you call a generator, it doesn't run until the first
> yield; it gets suspended before the first bytecode of the body. So if
> anything, you might get 1A, 1B, 1C, 2A, 2B, 2C. (Which would prove
> your point just as much of course.)

Ah, OK.  I was mistaken about the "first yield" part, but the rest
stands.  The problem is that as soon as task1 blocks on IO, the entire
current task (which includes outer(), par(), and both children) gets
unscheduled.  no part of task2 gets scheduled until it gets yielded
from, because the scheduler can't see it until then.

>
> Sadly I don't have a framework lying around where I can test this
> easily -- I'm pretty sure that the equivalent code in NDB interacts
> with the scheduler in a way that ensures round-robin scheduling.
>
>> To get parallelism I
>> think you have to schedule each sub-generator separately instead of
>> just yielding from them (which negates some of the benefits of yield
>> from like easy error handling).
>
> Honestly I don't mind of the scheduler has to be messy, as long the
> mess is hidden from the caller.

Agreed.

>
>> Even if there is a clever version of par() that works more like yield
>> from, you'd need to go back to explicit scheduling if you wanted
>> parallel execution without forcing everything to finish at the same
>> time (which is simple with Futures).
>
> Why wouldn't all generators that aren't blocked for I/O just run until
> their next yield, in a round-robin fashion? That's fair enough for me.
>
> But as I said, my intuition for how things work in Greg's world is not
> very good.

The good and bad parts of this proposal both stem from the fact that
yield from is very similar to just inlining everything together.  This
gives you the exception handling semantics that you expect from
synchronous code, but it means that the scheduler can't distinguish
between subtasks; you have to explicitly schedule them as top-level
tasks.

>
>>> Of course there's the question of what to do when one of the tasks
>>> raises an error -- I haven't quite figured that out in NDB either, it
>>> runs all the tasks to completion but the caller only sees the first
>>> exception. I briefly considered having an "multi-exception" but it
>>> felt too weird -- though I'm not married to that decision.
>>
>> In general for this kind of parallel operation I think it's fine to
>> say that one (unspecified) exception is raised in the outer function
>> and the rest are hidden.  With futures, "(r1, r2) = yield (f1, f2)" is
>> just shorthand for "r1 = yield f1; r2 = yield f2", so separating the
>> yields to have separate try/except blocks is no problem.  With yield
>> from it's not as good because the second operation can't proceed while
>> the outer function is waiting for the first.
>
> Hmmm, I think I see your point. This seems to follow if (as Greg
> insists) you don't have any decorators on the generators.
>
> OTOH I am okay with only getting one of the exceptions. But I think
> all of the remaining tasks should still be run to completion -- maybe
> the caller just cared about their side effects. Or maybe this should
> be an option to par().

That's probably a good idea.

-Ben

>
> --
> --Guido van Rossum (python.org/~guido)