[Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from)
Steve.Dower at microsoft.com
Sat Oct 20 21:31:12 CEST 2012
> - Nit: I don't like calling the event loop context; there are too many
> things called context (e.g. context managers in Python), so I prefer
> to call it what it is -- event loop or I/O loop.
The naming collision with context managers has been brought up before, so I'm okay with changing that. We used context mainly because it's close to the terminology used in .NET, where you schedule tasks/continuations in a particular SynchronizationContext. I believe "I/O loop" would be inaccurate, but "event loop" is probably appropriate.
> - You mention a query interface a few times but there are no details
> in your example code; can you elaborate? (Or was that a typo for
I think I just changed terminology while writing - this is the 'get_future_for' call, which is not guaranteed to provide a waitable/pollable object for any type. The intent is to allow an event loop to optionally provide support for (say) select(), but not to force that upon all implementations. If (when) someone implements a Windows GetMessage() based loop then requiring 'native' select() support is unfair. (Also, an implementation for Windows 8 would not directly involve an event loop, but would pass everything through to the underlying OS.)
> - This is almost completely isomorphic with NDB's tasklets, except
> that you borrow the Future class implementation from
> concurrent.futures -- I think that's the wrong building block to start
> with, because it is linked too closely to threads.
As far as I can see, the only link that futures have with threads is that the ThreadPoolExecutor class is in the same module. `Future` itself is merely an object that can be polled, waited on, or assigned a callback, which means it represents all asynchronous operations. Some uses are direct (e.g., polling a future that represents pollable I/O) while others require emulation (adding a callback for pollable I/O), which is partly why the 'get_future_for' function exists - to allow the event loop to use the object directly if it can.
> - There is a big speed difference between yield from <generator> and
> yield <future>. With yield <future>, the scheduler has to do
> significant work for each yield at an intermediate level, whereas with
> yield from, the schedule is only involved when actual blocking needs
> to be performed. In my experience, real code has lots of intermediate
> levels. Therefore I would like to use yield from. You can already do
> most things with yield from that you can do with Futures; there are a
> few operations that need a helper (in particular spawning truly
> concurrent tasks), but the helper code can be much simpler than the
> Future object, and isn't needed as often, so it's still a bare win.
I don't believe the scheduler is involved that frequently, but it is true that more Futures than are strictly necessary are created. The first step (up to a yield) of any @async method is always run immediately - if there is no yield, then the returned future is already completed and has the result. The event loop as implemented could be optimised slightly for this case, but since Future calls new callbacks immediately if it has already completed then we never 'unschedule' the task.
yield from can of course be used for the intermediate levels in exactly the same way as it is used for refactoring generators. The difference is that the top level is an @async decorator, at which point a Future is created. So 'read_async' might have @async applied, but it can 'yield from' any other generators that yield futures. Then the person calling 'read_async' is free to use any Future compatible interface rather than being forced into continuing the 'yield from' chain all the way to the top. (In particular, I think this works much better in the interactive scenario - I can write "x = read_async().result()", but how do you implement a 'yield from' approach in a REPL?)
More information about the Python-ideas