[stdlib-sig] futures - a new package for asynchronous execution

Brian Quinlan brian at sweetapp.com
Thu Feb 25 10:33:09 CET 2010


The PEP officially lives at:
http://python.org/dev/peps/pep-3148

but this version is the most up-to-date:
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pep-3148.txt


On Feb 24, 2010, at 7:04 AM, Jeffrey Yasskin wrote:

> On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <brian at sweetapp.com>  
> wrote:
>
> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
>
>> Where's the current version of the PEP?
>
> http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt
>
>> On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan <brian at sweetapp.com>  
>> wrote:
>>>
>>> On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
>>>
>
>>>> * I'd like users to be able to write Executors besides the simple
>>>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To  
>>>> enable
>>>> that, could you document what the subclassing interface for  
>>>> Executor
>>>> looks like? that is, what code do user-written Executors need to
>>>> include?
>>>
>>> I can do that.
>>>
>>>> I don't think it should include direct access to
>>>> future._state like ThreadPoolExecutor uses, if at all possible.
>>>
>>> Would it be reasonable to make Future an ABC, make a _Future that  
>>> subclasses
>>> it for internal usage and let other Executor subclasses define  
>>> their own
>>> Futures.
>>
>> What interface are you proposing for the Future ABC? It'll need to
>> support wait() and as_completed() from non-library Futures. I  
>> wouldn't
>> mind making the type just a duck-type (it probably wouldn't even need
>> an ABC), although I'd like to give people trying to implement their
>> own Executors as much help as possible. I'd assumed that giving  
>> Future
>> some public hooks would be easier than fixing the wait() interface,
>> but I could be wrong.
>
> See below.
>
>>>> * Could you specify in what circumstances a pure computational
>>>> Future-based program may deadlock? (Ideally, that would be  
>>>> "never".)
>>>> Your current implementation includes two such deadlocks, for which
>>>> I've attached a test.
>>>
>>>> * Do you want to make calling Executor.shutdown(wait=True) from  
>>>> within
>>>> the same Executor 1) detect the problem and raise an exception, 2)
>>>> deadlock, 3) unspecified behavior, or 4) wait for all other threads
>>>> and then let the current one continue?
>>>
>>> What about a note saying that using any futures functions or  
>>> methods from
>>> inside a scheduled call is likely to lead to deadlock unless care  
>>> is taken?
>>
>> Jesse pointed out that one of the first things people try to do when
>> using concurrency libraries is to try to use them inside themselves.
>> I've also tried to use a futures library that forbade nested use
>> ('cause I wrote it), and it was a real pain.
>
> You can use the API from within Executor-invoked functions - you  
> just have to be careful.
>
> It's the job of the PEP (and later the docs) to explain exactly what  
> care is needed. Or were you asking if I was ok with adding that  
> explanation to the PEP? I think that explanation is the minimum  
> requirement (that's what I meant by "Could you specify in what  
> circumstances a pure computational
> Future-based program may deadlock?"), but it would be better if it  
> could never deadlock, which is achievable by stealing work.

I don't think so, see below.

>> It should be easy enough to detect that the caller of
>> Executor.shutdown is one of the Executor's threads or processes,  
>> but I
>> wouldn't mind making the obviously incorrect "wait for my own
>> completion" deadlock or throw an exception, and it would make sense  
>> to
>> give Executor implementors their choice of which to do.
>>
>>>> * This is a nit, but I think that the parameter names for
>>>> ThreadPoolExecutor and ProcessPoolExecutor should be the same so
>>>> people can parametrize their code on those constructors. Right now
>>>> they're "max_threads" and "max_processes", respectively. I might
>>>> suggest "max_workers".
>>>
>>> I'm not sure that I like that. In general consolidating the  
>>> constructors for
>>> executors is not going to be possible.
>>
>> In general, yes, but in this case they're the same, and we should try
>> to avoid gratuitous differences.
>
> num_threads and num_processes is more explicit than num_workers but  
> I don't really care so I changed it.
>
> Thanks.
>
>>>> * I'd like users to be able to write Executors besides the simple
>>>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To  
>>>> enable
>>>> that, could you document what the subclassing interface for  
>>>> Executor
>>>> looks like? that is, what code do user-written Executors need to
>>>> include? I don't think it should include direct access to
>>>> future._state like ThreadPoolExecutor uses, if at all possible.
>>>
>>> One of the difficulties here is:
>>> 1. i don't want to commit to the internal implementation of Futures
>>
>> Yep, that's why to avoid requiring them to have direct access to the
>> internal variables.
>>
>>> 2. it might be hard to make it clear which methods are public to  
>>> users and
>>> which methods are public to executor implementors
>>
>> One way to do it would be to create another type for implementors and
>> pass it to the Future constructor.
>
> If we change the future interface like so:
>
> class Future(object):
>   # Existing public methods
>   ...
>   # For executors only
>   def set_result(self):
>     ...
>   def set_exception(self):
>     ...
>   def check_cancel_and_notify(self):
>     # returns True if the Future was cancelled and
>     # notifies anyone who cares i.e. waiters for
>     # wait() and as_completed
>
>
> Then an executor implementor need only implement:
>
> def submit(self, fn, *args, **kwargs):
>
> With the logic to actual execute fn(*args, **kwargs) and update the  
> returned future, of course.
>
> Thoughts?
>
> Could you write up the submit() implementations you're thinking of?  
> That kind of interface extension seems right.

I mean that submit will implement all of the application-specific  
logic and call the above methods as it processes the future. I added a  
note (but not much in the way of details) about that.

>
>>>> * Could you specify in what circumstances a pure computational
>>>> Future-based program may deadlock? (Ideally, that would be  
>>>> "never".)
>>>> Your current implementation includes two such deadlocks, for which
>>>> I've attached a test.
>>>
>>> Thanks for the tests but I wasn't planning on changing this  
>>> behavior. I
>>> don't really like the idea of using the calling thread to perform  
>>> the wait
>>> because:
>>> 1. not all executors will be able to implement that behavior
>>
>> Why not?
>
> What if my executor sends the data to a remove cluster for execution  
> and running it locally isn't feasible?
>
> If the executor can't send itself across the network, you're fine  
> since it'll be impossible to create cycles. If the executor can add  
> threads dynamically when it notices that it's not using enough of  
> the CPU, it's also fine since you remove the limited resource. If  
> the executor can be copied and cannot add threads, then it sent the  
> data one way somehow, so it should be able to send the data the  
> other way to execute locally. It _is_ possible to run out of memory  
> or stack space. Is that what you're worried about?
>
>> Thread pools can implement it,
>
> Do you have a strategy in mind that would let you detect arbitrary  
> deadlocks in threaded futures?
>
> Yes, AFAIK work stealing suffices for systems made up only of  
> futures and executors.  Non-future blocking objects can reintroduce  
> deadlocks, but I believe futures alone can't.

How would work stealing help with this sort of deadlock?

import time
def wait_on_b():
   time.sleep(5)
   print(b.result())
   return 5

def wait_on_a():
   time.sleep(5)
   print(a.result())
   return 6


f = ThreadPoolExecutor(max_workers=2)
a = f.submit(wait_on_b)
b = f.submit(wait_on_a)

In any case, I've updated the docs and PEP to indicate that deadlocks  
are possible.

Cheers,
Brian


>
> Cheers,
> Brian
>
>> and process pools make it
>> impossible to create cycles, so they also can't deadlock.
>>
>>> 2. it can only be made to work if no wait time is specified
>>
>> With a wait time, you have to avoid stealing work, but it's also
>> guaranteed not to deadlock, so it's fine.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/stdlib-sig/attachments/20100225/a32206e3/attachment-0001.html>


More information about the stdlib-sig mailing list