The async API of the future: yield-from
[This is the second spin-off thread from "asyncore: included batteries
don't fit"]
On Thu, Oct 11, 2012 at 6:32 PM, Greg Ewing
Guido van Rossum wrote:
It does bother me somehow that you're not using .send() and yield arguments at all. I notice that you have a lot of three-line code blocks like this:
block_for_reading(sock) yield data = sock.recv(1024)
I wouldn't say I have a "lot". In the spamserver, there are really only three -- one for accepting a connection, one for reading from a socket, and one for writing to a socket. These are primitive operations that would be provided by an async socket library.
Hm. In such a small sample program, three near-identical blocks is a lot!
Generally, all the yields would be hidden inside primitives like this. Normally, user code would never need to use 'yield', only 'yield from'.
This probably didn't come through as clearly as it might have in my tutorial. Part of the reason is that at the time I wrote it, I was having to manually expand yield-froms into for-loops, so I was reluctant to use any more of them than I needed to. Also, yield-from was a new and unfamiliar concept, and I didn't want to scare people by overusing it. These considerations led me to push some of the yields slightly further up the layer stack than they could be.
But the fact remains that you can't completely hide these yields -- the best you can do is replace them with a single yield-from.
The general form seems to be:
arrange for a callback when some operation can be done without blocking yield do the operation
This seems to be begging to be collapsed into a single line, e.g.
data = yield sock.recv_async(1024)
I'm not sure how you're imagining that would work, but whatever it is, it's wrong -- that just doesn't make sense.
That's a strong statement! It makes a lot of sense in a world using Futures and a Future-aware trampoline/scheduler, instead of yield-from and bare generators. I can see however that you don't like it in the yield-from world you're envisioning, and how it would be confusing there. I'll get back to this in a bit.
What *would* make sense is
data = yield from sock.recv_async(1024)
with sock.recv_async() being a primitive that encapsulates the block/yield/process triplet.
Right, that's how you would spell it.
(I would also prefer to see the socket wrapped in an object that makes it hard to accidentally block.)
It would be straightforward to make the primitives be methods of a socket wrapper object. I only used functions in the tutorial in the interests of keeping the amount of machinery to a bare minimum.
Understood.
But surely there's still a place for send() and other PEP 342 features?
In the wider world of generator usage, yes. If you have a generator that it makes sense to send() things into, for example, and you want to factor part of it out into another function, the fact that yield-from passes through sent values is useful.
But the only use for send() on a generator is when using it as a coroutine for a concurrent tasks system -- send() really makes no sense for generators used as iterators. And you're claiming, it seems, that you prefer yield-from for concurrent tasks.
But we're talking about a very specialised use of generators here, and so far I haven't thought of a use for sent or yielded values in this context that can't be done in a more straightforward way by other means.
Keep in mind that a value yielded by a generator being used as part of a coroutine is *not* seen by code calling it with yield-from. Rather, it comes out in the inner loop of the scheduler, from the next() call being used to resume the coroutine. Likewise, any send() call would have to be made by the scheduler, not the yield-from caller.
I'm very much aware of that. There is a *huge* difference between yield-from and yield. However, now that I've implemented a substantial library (NDB, which has thousands of users in the App Engine world, if not hundreds of thousands), I feel that "value = yield <something that returns a Future>" is quite a good paradigm, and the only part of PEP 380 I'm really looking forward to embracing (once App Engine supports Python 3.3) is the option to return a value from a generator -- which my users currently have to spell as "raise ndb.Return(<value>)".
So, the send/yield channel is exclusively for communication with the *scheduler* and nothing else. Under the old way of doing generator-based coroutines, this channel was used to simulate a call stack by yielding 'call' and 'return' instructions that the scheduler interpreted. But all that is now taken care of by the yield-from mechanism, and there is nothing left for the send/yield channel to do.
I understand that's the state of the world that you're looking forward to. However I'm slightly worried that in practice there are some issues to be resolved. One is what to do with operations directly implemented in C. It would be horrible to require C to create a fake generator. It would be mildly nasty to have to wrap these all in Python code just so you can use them with yield-from. Fortunately an iterator whose final __next__() raises StopIteration(<value>) works in the latest Python 3.3 (it didn't work in some of the betas IIRC).
my users sometimes want to treat something as a coroutine but they don't have any yields in it
def caller(): data = yield from reader()
def reader(): return 'dummy' yield
works, but if you drop the yield it doesn't work. With a decorator I know how to make it work either way.
If you're talking about a decorator that turns a function into a generator, I can't see anything particularly headachish about that. If you mean something else, you'll have to elaborate.
Well, I'm talking about a decorator that you *always* apply, and which does nothing (or very little) when wrapping a generator, but adds generator behavior when wrapping a non-generator function. Anyway, I am trying to come up with a table comparing Futures and your yield-from-using generators. I'm basing this on a subset of the PEP 3148 API, and I'm not presuming threads -- I'm just looking at the functionality around getting and setting callbacks, results, and exceptions. My reference is actually based on NDB, but the API there differs from PEP 3148 in uninteresting ways, so I'll use the PEP 3148 method names. (1) Calling an async operation and waiting for its result, using yield Futures: result = yield some_async_op(args) Yield-from: result = yield from some_async_op(args) (2) Setting the result of an async operation Futures: f.set_result(value) # From any callback Yield-from: return value # From the outermost generator (3) Handling an exception Futures: try: result = yield some_async_op(args) except MyException: <handle exception> Yield-from: try: result = yield from some_async_op(args) except MyException: <handle exception> Note: with yield-from, the tracebacks for unhandled exceptions are possibly prettier. (4) Raising an exception as the outcome of an async operation Futures: f.set_exception(<Exception instance>) Yield-from: raise <Exception instance or class> # From any of the generators Note: with Futures, the traceback also needs to be stored; in Python 3 it is stored on the Exception instance's __traceback__ attribute. But when letting exceptions bubble through multiple levels of nested calls, you must do something special to ensure the traceback looks right to the end user. (5) Having one async operation invoke another async operation Futures: @task def outer(args): res = yield inner(args) return res Yield-from: def outer(args): res = yield from inner(args) return res Note: I'm including this because in the Futures case, each level of yield requires the creation of a separate Future. In practice this requires decorating all async functions. And also as a lead-in to the next item. (6) Spawning off multiple async subtasks Futures: f1 = subtask1(args1) # Note: no yield!!! f2 = subtask2(args2) res1, res2 = yield f1, f2 Yield-from: ?????????? *** Greg, can you come up with a good idiom to spell concurrency at this level? Your example only has concurrency in the philosophers example, but it appears to interact directly with the scheduler, and the philosophers don't return values. *** (7) Checking whether an operation is already complete Futures: if f.done(): ... Yield-from: ????????????? (8) Getting the result of an operation multiple times Futures: f = async_op(args) # squirrel away a reference to f somewhere else r = yield f # ... later, elsewhere r = f.result() Yield-from: ??????????????? (9) Canceling an operation Futures: f.cancel() Yield-from: ??????????????? Note: I haven't needed canceling yet, and I believe Devin said that Twisted just got rid of it. However some of the JS Deferred implementations seem to support it. (10) Registering additional callbacks Futures: f.add_done_callback(callback) Yield-from: ??????? Note: this is used in NDB to trigger "hooks" that should run e.g. when a database write completes. The user's code just writes yield ent.put_async(); the trigger is automatically called by the Future's machinery. This also uses (8). -- --Guido van Rossum (python.org/~guido)
participants (17)
-
Antoine Pitrou
-
Ben Darnell
-
Calvin Spealman
-
Carlo Pires
-
Christian Tismer
-
Dino Viehland
-
Greg Ewing
-
Guido van Rossum
-
Jim Jewett
-
Laurens Van Houtven
-
Matt Joiner
-
Nick Coghlan
-
Piet Delport
-
Serhiy Storchaka
-
Steve Dower
-
Terry Reedy
-
Yuval Greenfield