[Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from)

Steve Dower Steve.Dower at microsoft.com
Mon Oct 22 17:55:27 CEST 2012


> Personally, I'm interested in designing a system, including an event loop, 
> where you can rely on the properties of cooperative scheduling to avoid
> ever touching (OS) threading locks. I think such a system should be "pure"
> and all interaction with threads should be mediated by the event loop.
> (It's okay if this means that the implementation of the event loop must at
> some point acquire a threading lock.) The Futures used by the tasks to
> coordinate amongst themselves should not require locking -- they should
> themselves be able to rely on the guarantees of the event loop not to
> invoke multiple callbacks in parallel.

Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O.

That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea.

(Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)

> IIUC you can do this on Windows with IOCP too, simply by only having a 
> single thread reading events.

Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.


[ My claim that using "yield from" exclusively is less portable and composable than "yield" predominantly. ]
> To me, the PEP 380 style is perfectly portable and composable. If you think
> it isn't, please elaborate.

I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) requires (a) that the caller is a generator and (b) that the callee is a generator. For the scheduling behavior to work correctly, it requires the event loop to be the one enumerating the generator, which means that if "open_async" must be called with YF then the entire user's call stack must be generators. Suddenly, wanting to use one async function has affected every single function.

By contrast, with @async/yield, the "scheduler" is actually in @async, so as soon as the function is called the subsequent step can be scheduled. There is no need to yield all the way up to the event loop, since the Future that was yielded inside open_async will queue the continuation when it completes (possibly triggered from another thread). Here, the user still gets the benefits like:

def not_an_async_func():
    ops = list(map(get_url_async, list_of_urls))
    # all URLs are now downloading in parallel, let's do some other synchronous stuff
    results = list(map(Future.result, ops))

Where multiple tasks are running simultaneously, even though they eventually use a blocking wait (or a wait_all or as_completed). Doing this with YF based tasks will require the user to create the scheduler explicitly (unlike the implicit one with @async) and prevent any other asynchronous tasks from running.

(And as I mentioned in earlier emails, YF can be used for its stated purpose by delegating to subgenerators - an @async function is a generator yielding futures, so there is no problem with it YFing subgenerators that also yield futures. But the @async decorator is where they are collected, and not the very base of the stack.)

However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]). 


Cheers,
Steve



More information about the Python-ideas mailing list