Re: [Python-ideas] The async API of the future: yield-from

15 Oct 2012


      On Mon, Oct 15, 2012 at 11:53 AM, Guido van Rossum  wrote:
...
On Sun, Oct 14, 2012 at 10:58 PM, Greg Ewing
 wrote:
...
Guido van Rossum wrote:
...
Why wouldn't all generators that aren't blocked for I/O just run until
their next yield, in a round-robin fashion? That's fair enough for me.
But as I said, my intuition for how things work in Greg's world is not
very good.
That's exactly how my scheduler behaves.
...
OTOH I am okay with only getting one of the exceptions. But I think
all of the remaining tasks should still be run to completion -- maybe
the caller just cared about their side effects. Or maybe this should
be an option to par().
This is hard to answer without considering real use cases,
but my feeling is that if I care enough about the results of
the subtasks to wait until they've all completed before continuing,
then if anything goes wrong in any of them, I might as well abandon
the whole computation.
If that's not the case, I'd be happy to wrap each one in a
try-except that doesn't propagate the exception to the main
task, but just records the information that the subtask
failed somewhere, for the main task to check afterwards.
Another direction to approach this is to consider that par()
ought to be just an optimisation -- the result should be the same
as if you'd written sequential code to perform the subtasks
one after another. And in that case, an exception in one would
prevent any of the following ones from executing, so it's fine
if par() behaves like that, too.
I'd think of such a par() more as something that saves me typing than
as an optimization. Anyway, the key functionality I cannot live
without here is to start multiple tasks concurrently. It seems that
without par() or some other scheduling primitive, you cannot do that:
if I write
a = foo_task()  # Search google
b = bar_task()  # Search bing
ra = yield from a
rb = yield from b
# now compare search results
the tasks run sequentially. A good par() should run then concurrently.
But there needs to be another way to get a task running immediately
and concurrently; I believe that would be
a = spawn(foo_task())
right? One could then at any later point use
ra = yield from a
One could also combine these and do e.g.
a = spawn(foo_task())
b = spawn(bar_task())
<do more work locally>
ra, rb = yield from par(a, b)
Have I got the spelling for spawn() right? In many other systems (e.g.
threads, greenlets) this kind of operation takes a callable, not the
result of calling a function (albeit a generator). If it takes a
generator, would it return the same generator or a different one to
wait for?
I think "start this other async task, but let me continue now" (spawn) is
so common and basic an operation it needs to be first class. What if
we allow both yield and yield from of a task? If we allow spawn(task())
then we're not getting nice tracebacks anyway, so I think we should
allow

  result1 = yield from task1() # wait for this other task
  result2 = yield from task2() # wait for this next

and

  future1 = yield task1() # spawn task
  future2 = yield task2() # spawn other task
  results = yield future1, future2

I was wrong to say we shouldn't do yield-from task scheduling, I see
the benefits now. but I don't think it has to be either or. I think it makes
sense to allow both, and that the behavior differences between the two
ways to invoke another task would be sensible. Both are primitives we
need to support as first-class operation. That is, without some wrapper
like spawn().
...
--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
http://mail.python.org/mailman/listinfo/python-ideas
-- 
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://techblog.ironfroggy.com/
Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

Re: [Python-ideas] The async API of the future: yield-from

Calvin Spealman