[Python-ideas] The async API of the future: yield-from

Guido van Rossum guido at python.org
Mon Oct 15 00:27:43 CEST 2012


On Sun, Oct 14, 2012 at 3:09 PM, Ben Darnell <ben at bendarnell.com> wrote:
> On Sun, Oct 14, 2012 at 7:36 AM, Guido van Rossum <guido at python.org> wrote:
>>> So it would look something like
>>>
>>> Yield-from:
>>>    task1 = subtask1(args1)
>>>    task2 = subtask2(args2)
>>>    res1, res2 = yield from par(task1, task2)
>>>
>>> where the implementation of par() is left as an exercise for
>>> the reader.
>>
>> So, can par() be as simple as
>>
>> def par(*args):
>>   results = []
>>   for task in args:
>>     result = yield from task
>>     results.append(result)
>>   return results
>>
>> ???
>>
>> Or does it need to interact with the scheduler to ensure fairness?
>> (Not having built one of these, my intuition for how the primitives
>> fit together is still lacking, so excuse me for asking naive
>> questions.)
>
> It's not just fairness, it needs to interact with the scheduler to get
> any parallelism at all if the sub-generators have more than one step.
> Consider:
>
> def task1():
>   print "1A"
>   yield
>   print "1B"
>   yield
>   print "1C"
>   # and so on...
>
> def task2():
>   print "2A"
>   yield
>   print "2B"
>   yield
>   print "2C"
>
> def outer():
>   yield from par(task1(), task2())

Hm, that's a little unrealistic -- in practice you'll rarely see code
that yields unless it is also blocking for I/O. I presume that if both
tasks immediately block for I/O, the one whose I/O completes first
gets the run next; and if it then blocks again, it'll again depend on
whose I/O finishes first.

(Admittedly this has little to do with fairness now.)

> Both tasks are started immediately, but can't progress further until
> they are yielded from to advance the iterator.  So with this version
> of par() you get 1A, 2A, 1B, 1C..., 2B, 2C.

Really? When you call a generator, it doesn't run until the first
yield; it gets suspended before the first bytecode of the body. So if
anything, you might get 1A, 1B, 1C, 2A, 2B, 2C. (Which would prove
your point just as much of course.)

Sadly I don't have a framework lying around where I can test this
easily -- I'm pretty sure that the equivalent code in NDB interacts
with the scheduler in a way that ensures round-robin scheduling.

> To get parallelism I
> think you have to schedule each sub-generator separately instead of
> just yielding from them (which negates some of the benefits of yield
> from like easy error handling).

Honestly I don't mind of the scheduler has to be messy, as long the
mess is hidden from the caller.

> Even if there is a clever version of par() that works more like yield
> from, you'd need to go back to explicit scheduling if you wanted
> parallel execution without forcing everything to finish at the same
> time (which is simple with Futures).

Why wouldn't all generators that aren't blocked for I/O just run until
their next yield, in a round-robin fashion? That's fair enough for me.

But as I said, my intuition for how things work in Greg's world is not
very good.

>> Of course there's the question of what to do when one of the tasks
>> raises an error -- I haven't quite figured that out in NDB either, it
>> runs all the tasks to completion but the caller only sees the first
>> exception. I briefly considered having an "multi-exception" but it
>> felt too weird -- though I'm not married to that decision.
>
> In general for this kind of parallel operation I think it's fine to
> say that one (unspecified) exception is raised in the outer function
> and the rest are hidden.  With futures, "(r1, r2) = yield (f1, f2)" is
> just shorthand for "r1 = yield f1; r2 = yield f2", so separating the
> yields to have separate try/except blocks is no problem.  With yield
> from it's not as good because the second operation can't proceed while
> the outer function is waiting for the first.

Hmmm, I think I see your point. This seems to follow if (as Greg
insists) you don't have any decorators on the generators.

OTOH I am okay with only getting one of the exceptions. But I think
all of the remaining tasks should still be run to completion -- maybe
the caller just cared about their side effects. Or maybe this should
be an option to par().

-- 
--Guido van Rossum (python.org/~guido)



More information about the Python-ideas mailing list