Re: [Python-ideas] The async API of the future: yield-from

14 Oct 2012


      On Sun, Oct 14, 2012 at 3:27 PM, Guido van Rossum  wrote:
...
On Sun, Oct 14, 2012 at 3:09 PM, Ben Darnell  wrote:
...
On Sun, Oct 14, 2012 at 7:36 AM, Guido van Rossum  wrote:
...
...
So it would look something like
Yield-from:
   task1 = subtask1(args1)
   task2 = subtask2(args2)
   res1, res2 = yield from par(task1, task2)
where the implementation of par() is left as an exercise for
the reader.
So, can par() be as simple as
def par(*args):
  results = []
  for task in args:
    result = yield from task
    results.append(result)
  return results
???
Or does it need to interact with the scheduler to ensure fairness?
(Not having built one of these, my intuition for how the primitives
fit together is still lacking, so excuse me for asking naive
questions.)
It's not just fairness, it needs to interact with the scheduler to get
any parallelism at all if the sub-generators have more than one step.
Consider:
def task1():
  print "1A"
  yield
  print "1B"
  yield
  print "1C"
  # and so on...
def task2():
  print "2A"
  yield
  print "2B"
  yield
  print "2C"
def outer():
  yield from par(task1(), task2())
Hm, that's a little unrealistic -- in practice you'll rarely see code
that yields unless it is also blocking for I/O. I presume that if both
tasks immediately block for I/O, the one whose I/O completes first
gets the run next; and if it then blocks again, it'll again depend on
whose I/O finishes first.
(Admittedly this has little to do with fairness now.)
...
Both tasks are started immediately, but can't progress further until
they are yielded from to advance the iterator.  So with this version
of par() you get 1A, 2A, 1B, 1C..., 2B, 2C.
Really? When you call a generator, it doesn't run until the first
yield; it gets suspended before the first bytecode of the body. So if
anything, you might get 1A, 1B, 1C, 2A, 2B, 2C. (Which would prove
your point just as much of course.)
Ah, OK.  I was mistaken about the "first yield" part, but the rest
stands.  The problem is that as soon as task1 blocks on IO, the entire
current task (which includes outer(), par(), and both children) gets
unscheduled.  no part of task2 gets scheduled until it gets yielded
from, because the scheduler can't see it until then.
...
Sadly I don't have a framework lying around where I can test this
easily -- I'm pretty sure that the equivalent code in NDB interacts
with the scheduler in a way that ensures round-robin scheduling.
...
To get parallelism I
think you have to schedule each sub-generator separately instead of
just yielding from them (which negates some of the benefits of yield
from like easy error handling).
Honestly I don't mind of the scheduler has to be messy, as long the
mess is hidden from the caller.
Agreed.
...
...
Even if there is a clever version of par() that works more like yield
from, you'd need to go back to explicit scheduling if you wanted
parallel execution without forcing everything to finish at the same
time (which is simple with Futures).
Why wouldn't all generators that aren't blocked for I/O just run until
their next yield, in a round-robin fashion? That's fair enough for me.
But as I said, my intuition for how things work in Greg's world is not
very good.
The good and bad parts of this proposal both stem from the fact that
yield from is very similar to just inlining everything together.  This
gives you the exception handling semantics that you expect from
synchronous code, but it means that the scheduler can't distinguish
between subtasks; you have to explicitly schedule them as top-level
tasks.
...
...
...
Of course there's the question of what to do when one of the tasks
raises an error -- I haven't quite figured that out in NDB either, it
runs all the tasks to completion but the caller only sees the first
exception. I briefly considered having an "multi-exception" but it
felt too weird -- though I'm not married to that decision.
In general for this kind of parallel operation I think it's fine to
say that one (unspecified) exception is raised in the outer function
and the rest are hidden.  With futures, "(r1, r2) = yield (f1, f2)" is
just shorthand for "r1 = yield f1; r2 = yield f2", so separating the
yields to have separate try/except blocks is no problem.  With yield
from it's not as good because the second operation can't proceed while
the outer function is waiting for the first.
Hmmm, I think I see your point. This seems to follow if (as Greg
insists) you don't have any decorators on the generators.
OTOH I am okay with only getting one of the exceptions. But I think
all of the remaining tasks should still be run to completion -- maybe
the caller just cared about their side effects. Or maybe this should
be an option to par().
That's probably a good idea.

-Ben
...
--
--Guido van Rossum (python.org/~guido)