[stdlib-sig] futures - a new package for asynchronous execution
Jeffrey Yasskin
jyasskin at gmail.com
Sat Nov 7 20:37:09 CET 2009
On Sat, Nov 7, 2009 at 7:32 AM, Jesse Noller <jnoller at gmail.com> wrote:
> On Sat, Nov 7, 2009 at 10:21 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>
>>> Which API? My comment wasn't aimed at the API of the package - in the
>>> time I got to scan it last night nothing jumped out at me as overly
>>> offensive API-wise.
>>
>> Not offensive, but probably too complicated if it's meant to be a simple
>> helper. Anyway, let's wait for the PEP.
>
>
> The PEP is right here:
>
> http://code.google.com/p/pythonfutures/source/browse/trunk/PEP.txt
>
> I'm interested in hearing specific complaints about the API in the
> context of what it's trying to *do*. The only thing which jumped out
> at me was the number of methods on FutureList; but then again, each
> one of those makes conceptual sense, even if they are verbose -
> they're explicit on what's being done.
Overall, I like the idea of having futures in the standard library,
and I like the idea of pulling common bits of multiprocessing and
threading into a concurrent.* package. Here's my
stream-of-consciousness review of the PEP. I'll try to ** things that
really affect the API.
The "Interface" section should start with a conceptual description of
what Executor, Future, and FutureList are. Something like "An Executor
is an object you can hand tasks to, which will run them for you,
usually in another thread or process. A Future represents a task that
may or may not have completed yet, and which can be waited for and its
value or exception queries. A FutureList is ... <haven't read that
far>."
** The Executor interface is pretty redundant, and it's missing the
most basic call. Fundamentally, all you need is an
Executor.execute(callable) method returning None, and all the
future-oriented methods can be built on top of that. I'd support using
Executor.submit(callable) for the simplest method instead, which
returns a Future, but there should be some way for implementers to
only implement execute() and get submit either for free or with a
1-line definition. (I'm using method names from
http://java.sun.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
in case I'm unclear about the semantics here.) run_to_futures,
run_to_results, and map should be implementable on top of the Future
interface, and shouldn't need to be methods on Executor. I'd recommend
they be demoted to helper functions in the concurrent.futures module
unless there's a reason they need to be methods, and that reason
should be documented in the PEP.
** run_to_futures() shouldn't take a return_when argument. It should
be possible to wait for those conditions on any list of Futures.
(_not_ just a FutureList)
The code sample looks like Executor is a context manager. What does
its __exit__ do? shutdown()? shutdown&awaitTermination? I prefer
waiting in Executor.__exit__, since that makes it easier for users to
avoid having tasks run after they've cleaned up data those tasks
depend on. But that could be my C++ bias, where we have to be sure to
free memory in the right places. Frank, does Java run into any
problems with people cleaning things up that an Executor's tasks
depend on without awaiting for the Executor first?
shutdown should explain why it's important. Specifically, since the
Executor controls threads, and those threads hold a reference to the
Executor, nothing will get garbage collected without the explicit
call.
** What happens when FutureList.wait(FIRST_COMPLETED) is called twice?
Does it return immediately the second time? Does it wait for the
second task to finish? I'm inclined to think that FutureList should go
away and be replaced by functions that just take lists of Futures.
In general, I think the has_done_futures(), exception_futures(), etc.
are fine even though their results may be out of date by the time you
inspect them. That's because any individual Future goes monotonically
from not-started->running->(exception|value), so users can take
advantage of even an out-of-date done_futures() result. However, it's
dangerous to have several query functions, since users may think that
running_futures() `union` done_futures() `union` cancelled_futures()
covers the whole FutureList, but instead a Future can move between two
of the sets between two of those calls. Instead, perhaps an atomic
partition() function would be better, which returns a collection of
sub-lists that cover the whole original set.
I would rename result() to get() (or maybe Antoine's suggestion of
__call__) to match Java. I'm not sure exception() needs to exist.
--- More general points ---
** Java's Futures made a mistake in not supporting work stealing, and
this has caused deadlocks at Google. Specifically, in a bounded-size
thread or process pool, when a task in the pool can wait for work
running in the same pool, you can fill up the pool with tasks that are
waiting for tasks that haven't started running yet. To avoid this,
Future.get() should be able to steal the task it's waiting on out of
the pool's queue and run it immediately.
** I think both the Future-oriented blocking model and the
callback-based model Deferreds support are important for different
situations. Futures tend to be easier to program with, while Deferreds
use fewer threads and can have more predictable latencies. It should
be possible to create a Future from a Deferred or a Deferred from a
Future without taking up a thread.
More information about the stdlib-sig
mailing list