On Mon, Oct 15, 2012 at 11:53 AM, Guido van Rossum
On Sun, Oct 14, 2012 at 10:58 PM, Greg Ewing
wrote: Guido van Rossum wrote:
Why wouldn't all generators that aren't blocked for I/O just run until their next yield, in a round-robin fashion? That's fair enough for me.
But as I said, my intuition for how things work in Greg's world is not very good.
That's exactly how my scheduler behaves.
OTOH I am okay with only getting one of the exceptions. But I think all of the remaining tasks should still be run to completion -- maybe the caller just cared about their side effects. Or maybe this should be an option to par().
This is hard to answer without considering real use cases, but my feeling is that if I care enough about the results of the subtasks to wait until they've all completed before continuing, then if anything goes wrong in any of them, I might as well abandon the whole computation.
If that's not the case, I'd be happy to wrap each one in a try-except that doesn't propagate the exception to the main task, but just records the information that the subtask failed somewhere, for the main task to check afterwards.
Another direction to approach this is to consider that par() ought to be just an optimisation -- the result should be the same as if you'd written sequential code to perform the subtasks one after another. And in that case, an exception in one would prevent any of the following ones from executing, so it's fine if par() behaves like that, too.
I'd think of such a par() more as something that saves me typing than as an optimization. Anyway, the key functionality I cannot live without here is to start multiple tasks concurrently. It seems that without par() or some other scheduling primitive, you cannot do that: if I write
a = foo_task() # Search google b = bar_task() # Search bing ra = yield from a rb = yield from b # now compare search results
the tasks run sequentially. A good par() should run then concurrently. But there needs to be another way to get a task running immediately and concurrently; I believe that would be
a = spawn(foo_task())
right? One could then at any later point use
ra = yield from a
One could also combine these and do e.g.
a = spawn(foo_task()) b = spawn(bar_task()) <do more work locally> ra, rb = yield from par(a, b)
Have I got the spelling for spawn() right? In many other systems (e.g. threads, greenlets) this kind of operation takes a callable, not the result of calling a function (albeit a generator). If it takes a generator, would it return the same generator or a different one to wait for?
I think "start this other async task, but let me continue now" (spawn) is so common and basic an operation it needs to be first class. What if we allow both yield and yield from of a task? If we allow spawn(task()) then we're not getting nice tracebacks anyway, so I think we should allow result1 = yield from task1() # wait for this other task result2 = yield from task2() # wait for this next and future1 = yield task1() # spawn task future2 = yield task2() # spawn other task results = yield future1, future2 I was wrong to say we shouldn't do yield-from task scheduling, I see the benefits now. but I don't think it has to be either or. I think it makes sense to allow both, and that the behavior differences between the two ways to invoke another task would be sensible. Both are primitives we need to support as first-class operation. That is, without some wrapper like spawn().
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy