On Sun, Oct 14, 2012 at 3:27 PM, Guido van Rossum firstname.lastname@example.org wrote:
On Sun, Oct 14, 2012 at 3:09 PM, Ben Darnell email@example.com wrote:
On Sun, Oct 14, 2012 at 7:36 AM, Guido van Rossum firstname.lastname@example.org wrote:
So it would look something like
Yield-from: task1 = subtask1(args1) task2 = subtask2(args2) res1, res2 = yield from par(task1, task2)
where the implementation of par() is left as an exercise for the reader.
So, can par() be as simple as
def par(*args): results =  for task in args: result = yield from task results.append(result) return results
Or does it need to interact with the scheduler to ensure fairness? (Not having built one of these, my intuition for how the primitives fit together is still lacking, so excuse me for asking naive questions.)
It's not just fairness, it needs to interact with the scheduler to get any parallelism at all if the sub-generators have more than one step. Consider:
def task1(): print "1A" yield print "1B" yield print "1C" # and so on...
def task2(): print "2A" yield print "2B" yield print "2C"
def outer(): yield from par(task1(), task2())
Hm, that's a little unrealistic -- in practice you'll rarely see code that yields unless it is also blocking for I/O. I presume that if both tasks immediately block for I/O, the one whose I/O completes first gets the run next; and if it then blocks again, it'll again depend on whose I/O finishes first.
(Admittedly this has little to do with fairness now.)
Both tasks are started immediately, but can't progress further until they are yielded from to advance the iterator. So with this version of par() you get 1A, 2A, 1B, 1C..., 2B, 2C.
Really? When you call a generator, it doesn't run until the first yield; it gets suspended before the first bytecode of the body. So if anything, you might get 1A, 1B, 1C, 2A, 2B, 2C. (Which would prove your point just as much of course.)
Ah, OK. I was mistaken about the "first yield" part, but the rest stands. The problem is that as soon as task1 blocks on IO, the entire current task (which includes outer(), par(), and both children) gets unscheduled. no part of task2 gets scheduled until it gets yielded from, because the scheduler can't see it until then.
Sadly I don't have a framework lying around where I can test this easily -- I'm pretty sure that the equivalent code in NDB interacts with the scheduler in a way that ensures round-robin scheduling.
To get parallelism I think you have to schedule each sub-generator separately instead of just yielding from them (which negates some of the benefits of yield from like easy error handling).
Honestly I don't mind of the scheduler has to be messy, as long the mess is hidden from the caller.
Even if there is a clever version of par() that works more like yield from, you'd need to go back to explicit scheduling if you wanted parallel execution without forcing everything to finish at the same time (which is simple with Futures).
Why wouldn't all generators that aren't blocked for I/O just run until their next yield, in a round-robin fashion? That's fair enough for me.
But as I said, my intuition for how things work in Greg's world is not very good.
The good and bad parts of this proposal both stem from the fact that yield from is very similar to just inlining everything together. This gives you the exception handling semantics that you expect from synchronous code, but it means that the scheduler can't distinguish between subtasks; you have to explicitly schedule them as top-level tasks.
Of course there's the question of what to do when one of the tasks raises an error -- I haven't quite figured that out in NDB either, it runs all the tasks to completion but the caller only sees the first exception. I briefly considered having an "multi-exception" but it felt too weird -- though I'm not married to that decision.
In general for this kind of parallel operation I think it's fine to say that one (unspecified) exception is raised in the outer function and the rest are hidden. With futures, "(r1, r2) = yield (f1, f2)" is just shorthand for "r1 = yield f1; r2 = yield f2", so separating the yields to have separate try/except blocks is no problem. With yield from it's not as good because the second operation can't proceed while the outer function is waiting for the first.
Hmmm, I think I see your point. This seems to follow if (as Greg insists) you don't have any decorators on the generators.
OTOH I am okay with only getting one of the exceptions. But I think all of the remaining tasks should still be run to completion -- maybe the caller just cared about their side effects. Or maybe this should be an option to par().
That's probably a good idea.
-- --Guido van Rossum (python.org/~guido)