[Python-ideas] async: feedback on EventLoop API

Fri Dec 21 16:45:46 CET 2012

On Fri, Dec 21, 2012 at 3:38 AM, Jasper St. Pierre <jstpierre at mecheye.net>
wrote:
> I read over the wait_one() proposal again, and I still don't understand
it,
> so it would need more explanation to me.
>
> But I don't see the point of avoiding callbacks. In this case, we have two
> or more in-flight requests that can be finished at any time. This does not
> have a synchronous code equivalent -- callbacks are pretty much the only
> mechanism we can use to be notified when something is done.

Perhaps you haven't quite gotten used to coroutines? There are callbacks
underneath making it all work, but the user code rarely sees those. Let's
start with the following *synchronous* code as an example.

def indexer(urls):
    # urls is a set of strings
    done = {}  # dict mapping url to (data, links)
    while urls:
        data = urlfetch(url.pop())
        links = parse(data)
        done[url] = (data, links)
        for link in link:
            if link not in urls and link not in done:
                urls.add(link)
    return done

(Let's hope this is indexing a small static site and not the entire
internet. :-)

Now suppose we make urlfetch() a coroutine and we want to run all the
urlfetches in parallel. The toplevel index() function becomes a coroutine
too. We use the convention that coroutines' names end in _async, to remind
us that they return Futures. The phrase "x = yield from foo_async()" is
equivalent to the synchronous call "x = foo()".

@coroutine
def indexer_async(urls):
    done = {}
    # A dict mapping tasks to urls:
    running = {Task(urlfetch_async(url)), url for url in urls}
    while running:
        # The yield from will return a Future
        tsk = *yield from* wait_one_async(running)
        url = running.pop(tsk)
        data = tsk.result()  # May raise
        links = parse(data)
        done[url] = (data, links)
        for link in links:
            if link not in urls and link not in done:
                urls.add(link)
                tsk = Task(urlfetch_async(link)
                running[tsk] = link
    return done

This creates len(urls) initial tasks to parse the urls, and creates new
urls as new links are parsed. The assumption here is that the only blocking
I/O is done in the urlfetch_async() task. The indexer blocks at the *yield
from* in the marked line, at which point any or all of the urlfetch tasks
get to run some, and once one of them completes, wait_one_async() returns
that task. (A task is a Future that wraps a coroutine, by the way.
wait_one_async() works with Futures too.) We then inspect the completed
task with .result(), which gives us the data, which we parse as usual. The
data structures are a little more elaborate because we have to keep track
of the mapping from task to url. We add new tasks to the running dict as
soon as we have parsed their links, so they can all get started.

Note that in PEP 3156, I don't use the _async convention, but everything in
this example will work there once wait_one() is added.

Also note that the trick is that wait_one_async() must return a Future
whose result is another Future. The first Future is used (and thrown away)
by *yield from*; that Future's result is one of the original Futures
representing a completed task.

I hope this is clearer. I'm not saying this is the best or only way of
writing an async indexer using yield from (and I left out error handling)
but hopefully it is an illustrative example.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121221/b0b61eb1/attachment.html>