[Python-ideas] The async API of the future: yield-from

Fri Oct 12 21:18:34 CEST 2012

[This is the second spin-off thread from "asyncore: included batteries
don't fit"]

On Thu, Oct 11, 2012 at 6:32 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>> It does bother me somehow that you're not using .send() and yield
>> arguments at all. I notice that you have a lot of three-line code
>> blocks like this:
>>
>>       block_for_reading(sock)
>>       yield
>>       data = sock.recv(1024)

> I wouldn't say I have a "lot". In the spamserver, there are really
> only three -- one for accepting a connection, one for reading from
> a socket, and one for writing to a socket. These are primitive
> operations that would be provided by an async socket library.

Hm. In such a small sample program, three near-identical blocks is a lot!

> Generally, all the yields would be hidden inside primitives like
> this. Normally, user code would never need to use 'yield', only
> 'yield from'.
>
> This probably didn't come through as clearly as it might have in my
> tutorial. Part of the reason is that at the time I wrote it, I was
> having to manually expand yield-froms into for-loops, so I was
> reluctant to use any more of them than I needed to. Also, yield-from
> was a new and unfamiliar concept, and I didn't want to scare people
> by overusing it. These considerations led me to push some of the
> yields slightly further up the layer stack than they could be.

But the fact remains that you can't completely hide these yields --
the best you can do is replace them with a single yield-from.

>> The general form seems to be:
>>
>>       arrange for a callback when some operation can be done without blocking
>>       yield
>>       do the operation
>>
>> This seems to be begging to be collapsed into a single line, e.g.
>>
>>       data = yield sock.recv_async(1024)

> I'm not sure how you're imagining that would work, but whatever
> it is, it's wrong -- that just doesn't make sense.

That's a strong statement! It makes a lot of sense in a world using
Futures and a Future-aware trampoline/scheduler, instead of yield-from
and bare generators. I can see however that you don't like it in the
yield-from world you're envisioning, and how it would be confusing
there. I'll get back to this in a bit.

> What *would* make sense is
>
>    data = yield from sock.recv_async(1024)
>
> with sock.recv_async() being a primitive that encapsulates the
> block/yield/process triplet.

Right, that's how you would spell it.

>> (I would also prefer to see the socket wrapped in an object that makes
>> it hard to accidentally block.)

> It would be straightforward to make the primitives be methods of a
> socket wrapper object. I only used functions in the tutorial in the
> interests of keeping the amount of machinery to a bare minimum.

Understood.

>> But surely there's still a place for send() and other PEP 342 features?

> In the wider world of generator usage, yes. If you have a
> generator that it makes sense to send() things into, for
> example, and you want to factor part of it out into another
> function, the fact that yield-from passes through sent values
> is useful.

But the only use for send() on a generator is when using it as a
coroutine for a concurrent tasks system -- send() really makes no
sense for generators used as iterators. And you're claiming, it seems,
that you prefer yield-from for concurrent tasks.

> But we're talking about a very specialised use of generators
> here, and so far I haven't thought of a use for sent or yielded
> values in this context that can't be done in a more straightforward
> way by other means.
>
> Keep in mind that a value yielded by a generator being used as
> part of a coroutine is *not* seen by code calling it with
> yield-from. Rather, it comes out in the inner loop of the
> scheduler, from the next() call being used to resume the
> coroutine. Likewise, any send() call would have to be made
> by the scheduler, not the yield-from caller.

I'm very much aware of that. There is a *huge* difference between
yield-from and yield.

However, now that I've implemented a substantial library (NDB, which
has thousands of users in the App Engine world, if not hundreds of
thousands), I feel that "value = yield <something that returns a
Future>" is quite a good paradigm, and the only part of PEP 380 I'm
really looking forward to embracing (once App Engine supports Python
3.3) is the option to return a value from a generator -- which my
users currently have to spell as "raise ndb.Return(<value>)".

> So, the send/yield channel is exclusively for communication
> with the *scheduler* and nothing else. Under the old way of
> doing generator-based coroutines, this channel was used to
> simulate a call stack by yielding 'call' and 'return'
> instructions that the scheduler interpreted. But all that
> is now taken care of by the yield-from mechanism, and there
> is nothing left for the send/yield channel to do.

I understand that's the state of the world that you're looking forward
to. However I'm slightly worried that in practice there are some
issues to be resolved. One is what to do with operations directly
implemented in C. It would be horrible to require C to create a fake
generator. It would be mildly nasty to have to wrap these all in
Python code just so you can use them with yield-from. Fortunately an
iterator whose final __next__() raises StopIteration(<value>) works in
the latest Python 3.3 (it didn't work in some of the betas IIRC).

>> my users sometimes want to
>> treat something as a coroutine but they don't have any yields in it
>>
>> def caller():
>>   data = yield from reader()
>>
>> def reader():
>>     return 'dummy'
>>     yield
>>
>> works, but if you drop the yield it doesn't work. With a decorator I
>> know how to make it work either way.

> If you're talking about a decorator that turns a function
> into a generator, I can't see anything particularly headachish
> about that. If you mean something else, you'll have to elaborate.

Well, I'm talking about a decorator that you *always* apply, and which
does nothing (or very little) when wrapping a generator, but adds
generator behavior when wrapping a non-generator function.

Anyway, I am trying to come up with a table comparing Futures and your
yield-from-using generators. I'm basing this on a subset of the PEP
3148 API, and I'm not presuming threads -- I'm just looking at the
functionality around getting and setting callbacks, results, and
exceptions. My reference is actually based on NDB, but the API there
differs from PEP 3148 in uninteresting ways, so I'll use the PEP 3148
method names.

(1) Calling an async operation and waiting for its result, using yield

Futures:
  result = yield some_async_op(args)

Yield-from:
  result = yield from some_async_op(args)

(2) Setting the result of an async operation

Futures:
  f.set_result(value)  # From any callback

Yield-from:
  return value  # From the outermost generator

(3) Handling an exception

Futures:
  try:
    result = yield some_async_op(args)
  except MyException:
    <handle exception>

Yield-from:
  try:
    result = yield from some_async_op(args)
  except MyException:
    <handle exception>

Note: with yield-from, the tracebacks for unhandled exceptions are
possibly prettier.

(4) Raising an exception as the outcome of an async operation

Futures:
  f.set_exception(<Exception instance>)

Yield-from:
  raise <Exception instance or class>  # From any of the generators

Note: with Futures, the traceback also needs to be stored; in Python 3
it is stored on the Exception instance's __traceback__ attribute. But
when letting exceptions bubble through multiple levels of nested
calls, you must do something special to ensure the traceback looks
right to the end user.

(5) Having one async operation invoke another async operation

Futures:
  @task
  def outer(args):
    res = yield inner(args)
    return res

Yield-from:
  def outer(args):
    res = yield from inner(args)
    return res

Note: I'm including this because in the Futures case, each level of
yield requires the creation of a separate Future. In practice this
requires decorating all async functions. And also as a lead-in to the
next item.

(6) Spawning off multiple async subtasks

Futures:
  f1 = subtask1(args1)  # Note: no yield!!!
  f2 = subtask2(args2)
  res1, res2 = yield f1, f2

Yield-from:
  ??????????

*** Greg, can you come up with a good idiom to spell concurrency at
this level? Your example only has concurrency in the philosophers
example, but it appears to interact directly with the scheduler, and
the philosophers don't return values. ***

(7) Checking whether an operation is already complete

Futures:
  if f.done(): ...

Yield-from:
  ?????????????

(8) Getting the result of an operation multiple times

Futures:

  f = async_op(args)
  # squirrel away a reference to f somewhere else
  r = yield f
  # ... later, elsewhere
  r = f.result()

Yield-from:
  ???????????????

(9) Canceling an operation

Futures:
  f.cancel()

Yield-from:
  ???????????????

Note: I haven't needed canceling yet, and I believe Devin said that
Twisted just got rid of it. However some of the JS Deferred
implementations seem to support it.

(10) Registering additional callbacks

Futures:
  f.add_done_callback(callback)

Yield-from:
  ???????

Note: this is used in NDB to trigger "hooks" that should run e.g. when
a database write completes. The user's code just writes yield
ent.put_async(); the trigger is automatically called by the Future's
machinery. This also uses (8).

-- 
--Guido van Rossum (python.org/~guido)