[Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from)

Mon Oct 22 19:34:38 CEST 2012

On Mon, Oct 22, 2012 at 8:55 AM, Steve Dower <Steve.Dower at microsoft.com> wrote:
>> Personally, I'm interested in designing a system, including an event loop,
>> where you can rely on the properties of cooperative scheduling to avoid
>> ever touching (OS) threading locks. I think such a system should be "pure"
>> and all interaction with threads should be mediated by the event loop.
>> (It's okay if this means that the implementation of the event loop must at
>> some point acquire a threading lock.) The Futures used by the tasks to
>> coordinate amongst themselves should not require locking -- they should
>> themselves be able to rely on the guarantees of the event loop not to
>> invoke multiple callbacks in parallel.
>
> Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O.

I'm curious what the Twisted folks have to say about this. Or the
folks using gevent.

I think your world view is colored by Windows; that's fine, we need
input from experienced Windows users. But I can certainly imagine
other ways of dealing with this.

For example, in CPython, at least, a callback that is called directly
by the OS cannot call straight into Python anyway -- you have to
acquire the GIL first. This pretty much means that an unconstrained
callback directly from the OS cannot call straight into Python -- it
has to put something into a queue, and the bytecode interpreter will
eventuall call it (possibly in another thread). This is how signal
handlers are invoked too.

> That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code

If interrupt-scheduled means what I think it means, this can only be C
code. For the Python callback, see above.

> queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea.

I don't actually believe we should avoid all locks. I do believe that
there should be a separate mechanism, likely OS-specific, whereby the
"pure" async world and the "messy" threading world can hand off data
to each other. It is probably unavoidable that the implementation of
this mechanism touches a threading lock. But this does not mean that
the rest of the "pure" world should need to use a Future class that
touches threading locks.

> (Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)

This sounds like a potentially fatal clash in terminology. In the way
I use 'async', Twisted, Tornado and gevent certainly qualify, and all
those have huge parts of their API where there is no non-deterministic
switching in sight -- in fact, they all carefully fence off the part
that does interact with threads. For example, the Twisted folks have
argued that one of the big advantages of using Twisted's Deferred
class is that while a callback is running, the state of the world
remains constant (except for actions made by the callback itself,
obviously).

What other term should we use to encompass this world view (which IMO
is a perfectly valid abstraction for a lot of I/O-related
concurrency)?

>> IIUC you can do this on Windows with IOCP too, simply by only having a
>> single thread reading events.
>
> Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.

It does sound like this may be unique to Windows, or at least not
shared with most of the UNIX world (UNIX ports of IOCP
notwithstanding).

> [ My claim that using "yield from" exclusively is less portable and composable than "yield" predominantly. ]
>> To me, the PEP 380 style is perfectly portable and composable. If you think
>> it isn't, please elaborate.
>
> I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) requires (a) that the caller is a generator and (b) that the callee is a generator. For the scheduling behavior to work correctly, it requires the event loop to be the one enumerating the generator, which means that if "open_async" must be called with YF then the entire user's call stack must be generators. Suddenly, wanting to use one async function has affected every single function.

And that is by design -- Greg *wants* it to be that way, and so far I
haven't found a reason to disagree with him. It seems you just
fundamentally disagree with the design, but your arguments come from a
fundamentally different world view.

> By contrast, with @async/yield, the "scheduler" is actually in @async, so as soon as the function is called the subsequent step can be scheduled. There is no need to yield all the way up to the event loop, since the Future that was yielded inside open_async will queue the continuation when it completes (possibly triggered from another thread).

Note that in the YF world, there are also ways to stop the yield to
bubble all the way to the top. You simply call the generator function,
which gives you a generator object, and the scheduler module or class
can offer a variety of APIs to do things with it -- e.g. run it
without waiting for it (yet), run several of these in parallel until
one of them (or all of them) completes, etc.

> Here, the user still gets the benefits like:
>
> def not_an_async_func():
>     ops = list(map(get_url_async, list_of_urls))
>     # all URLs are now downloading in parallel, let's do some other synchronous stuff
>     results = list(map(Future.result, ops))

And in the YF world you can do that too.

> Where multiple tasks are running simultaneously, even though they eventually use a blocking wait (or a wait_all or as_completed). Doing this with YF based tasks will require the user to create the scheduler explicitly (unlike the implicit one with @async) and prevent any other asynchronous tasks from running.

I don't see that. The user just has to be able to get a reference to
the schedule, which should be part of the scheduler's API (e.g. a
function in its module that returns the current scheduler instance).

> (And as I mentioned in earlier emails, YF can be used for its stated purpose by delegating to subgenerators - an @async function is a generator yielding futures, so there is no problem with it YFing subgenerators that also yield futures. But the @async decorator is where they are collected, and not the very base of the stack.)

With YF it doesn't have to be the base of the stack. It just usually is.

I feel we are going around in circles.

> However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]).

I don't know what OpenCV is, but assuming it is something that doesn't
know about YF, then it needs to run in a thread of its own (or a
threadpool). It is perfectly possible to add a primitive operation to
the YF scheduler that says "run this in a threadpool and wake me up
when it produces a result". The public API for that primitive can
certainly use YF itself -- the messing interface with threads can be
completely hidden from view. IMO YF scheduler worth using for real
work must provide such a primitive (it was one of the first things I
had to do in my own prototype, to be able to call
socket.getaddrinfo()).

-- 
--Guido van Rossum (python.org/~guido)