[stdlib-sig] futures - a new package for asynchronous execution

Fri Nov 13 03:38:33 CET 2009

Hey all,

I compiled a summary of people's feedback (about technical issues - I  
agree that the docs could be better but agreeing on the API seems like  
the first step) and have some API change proposals.

Here is a summary of the feedback:
- Use Twisted Deferreds rather than Futures
- The API too complex
- Make Future a callable and drop the .result()/.exception() methods
- Remove .wait() from Executor
- Make it easy to process results in the order of completion rather  
than in the order that the futures were generated
- Executor context managers should wait until their workers complete  
before exiting
- Extract Executor.map, etc. into separate functions/modules
- FutureList has too many methods or is not necessary
- Executor should have an easy way to produce a single future
- Should be able to wait on an arbitrary list of futures
- Should have a way of avoiding deadlock (will follow-up on this  
separately)

Here is what I suggest as far as API changes (the docs suck, I'll  
polish them when we reach consensus):

FutureList is eliminated completely.

Future remains unchanged - I disagree that Deferreds would be better,  
that .exception() is not useful, and that .result() should be  
renamed .get() or .__call__(). But I am easily persuadable :-)

The Executor ABC is simplified to only contain a single method:

def Executor.submit(self, fn, *args, **kwargs) :

Submits a call for execution and returns a Future representing the  
pending results of fn(*args, **kwargs)

map becomes a utility function:

def map(executor, *iterables, timeout=None)

Equivalent to map(func, *iterables) but executed asynchronously and  
possibly out-of-order. The returned iterator raises a TimeoutError if  
__next__() is called and the result isn’t available after timeout  
seconds from the original call to run_to_results(). If timeout is not  
specified or None then there is no limit to the wait time. If a call  
raises an exception then that exception will be raised when its value  
is retrieved from the iterator.

wait becomes a utility function that can wait on any iterable of  
Futures:

def wait(futures, return_when=ALL_COMPLETED)

Wait until the given condition is met for the given futures. This  
method should always be called using keyword arguments, which are:

timeout can be used to control the maximum number of seconds to wait  
before returning. If timeout is not specified or None then there is no  
limit to the wait time.

return_when indicates when the method should return. It must be one of  
the following constants:

     NEXT_COMPLETED
     NEXT_EXCEPTION
     ALL_COMPLETED

a new utility function is added that iterates over the given Futures  
and returns the as they are completed:

def itercompleted(futures, timeout=None):

Returns an iterator that returns a completed Future from the given  
list when __next__() is called. If no Futures are completed then  
__next__() is called then __next__() waits until one does complete.  
Raises a TimeoutError if __next__() is called and no completed future  
is available after timeout seconds from the original call.

The URL loading example becomes:

import functools
import urllib.request
import futures

URLS = ['http://www.foxnews.com/',
         'http://www.cnn.com/',
         'http://europe.wsj.com/',
         'http://www.bbc.co.uk/',
         'http://some-made-up-domain.com/']

def load_url(url, timeout):
     return urllib.request.urlopen(url, timeout=timeout).read()

with futures.ThreadPoolExecutor(50) as executor:
   fs = [executor.submit(load_url, url, timeout=30) for url in URLS]

for future in futures.itercompleted(fs):
     if future.exception() is not None:
         print('%r generated an exception: %s' % (url,  
future.exception()))
     else:
         print('%r page is %d bytes' % (url, len(future.result())))

What do you think? Are we moving in the right direction?

Cheers,
Brian