[stdlib-sig] futures - a new package for asynchronous execution
Brian Quinlan
brian at sweetapp.com
Thu Feb 25 10:33:09 CET 2010
The PEP officially lives at:
http://python.org/dev/peps/pep-3148
but this version is the most up-to-date:
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pep-3148.txt
On Feb 24, 2010, at 7:04 AM, Jeffrey Yasskin wrote:
> On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <brian at sweetapp.com>
> wrote:
>
> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
>
>> Where's the current version of the PEP?
>
> http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt
>
>> On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan <brian at sweetapp.com>
>> wrote:
>>>
>>> On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
>>>
>
>>>> * I'd like users to be able to write Executors besides the simple
>>>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To
>>>> enable
>>>> that, could you document what the subclassing interface for
>>>> Executor
>>>> looks like? that is, what code do user-written Executors need to
>>>> include?
>>>
>>> I can do that.
>>>
>>>> I don't think it should include direct access to
>>>> future._state like ThreadPoolExecutor uses, if at all possible.
>>>
>>> Would it be reasonable to make Future an ABC, make a _Future that
>>> subclasses
>>> it for internal usage and let other Executor subclasses define
>>> their own
>>> Futures.
>>
>> What interface are you proposing for the Future ABC? It'll need to
>> support wait() and as_completed() from non-library Futures. I
>> wouldn't
>> mind making the type just a duck-type (it probably wouldn't even need
>> an ABC), although I'd like to give people trying to implement their
>> own Executors as much help as possible. I'd assumed that giving
>> Future
>> some public hooks would be easier than fixing the wait() interface,
>> but I could be wrong.
>
> See below.
>
>>>> * Could you specify in what circumstances a pure computational
>>>> Future-based program may deadlock? (Ideally, that would be
>>>> "never".)
>>>> Your current implementation includes two such deadlocks, for which
>>>> I've attached a test.
>>>
>>>> * Do you want to make calling Executor.shutdown(wait=True) from
>>>> within
>>>> the same Executor 1) detect the problem and raise an exception, 2)
>>>> deadlock, 3) unspecified behavior, or 4) wait for all other threads
>>>> and then let the current one continue?
>>>
>>> What about a note saying that using any futures functions or
>>> methods from
>>> inside a scheduled call is likely to lead to deadlock unless care
>>> is taken?
>>
>> Jesse pointed out that one of the first things people try to do when
>> using concurrency libraries is to try to use them inside themselves.
>> I've also tried to use a futures library that forbade nested use
>> ('cause I wrote it), and it was a real pain.
>
> You can use the API from within Executor-invoked functions - you
> just have to be careful.
>
> It's the job of the PEP (and later the docs) to explain exactly what
> care is needed. Or were you asking if I was ok with adding that
> explanation to the PEP? I think that explanation is the minimum
> requirement (that's what I meant by "Could you specify in what
> circumstances a pure computational
> Future-based program may deadlock?"), but it would be better if it
> could never deadlock, which is achievable by stealing work.
I don't think so, see below.
>> It should be easy enough to detect that the caller of
>> Executor.shutdown is one of the Executor's threads or processes,
>> but I
>> wouldn't mind making the obviously incorrect "wait for my own
>> completion" deadlock or throw an exception, and it would make sense
>> to
>> give Executor implementors their choice of which to do.
>>
>>>> * This is a nit, but I think that the parameter names for
>>>> ThreadPoolExecutor and ProcessPoolExecutor should be the same so
>>>> people can parametrize their code on those constructors. Right now
>>>> they're "max_threads" and "max_processes", respectively. I might
>>>> suggest "max_workers".
>>>
>>> I'm not sure that I like that. In general consolidating the
>>> constructors for
>>> executors is not going to be possible.
>>
>> In general, yes, but in this case they're the same, and we should try
>> to avoid gratuitous differences.
>
> num_threads and num_processes is more explicit than num_workers but
> I don't really care so I changed it.
>
> Thanks.
>
>>>> * I'd like users to be able to write Executors besides the simple
>>>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To
>>>> enable
>>>> that, could you document what the subclassing interface for
>>>> Executor
>>>> looks like? that is, what code do user-written Executors need to
>>>> include? I don't think it should include direct access to
>>>> future._state like ThreadPoolExecutor uses, if at all possible.
>>>
>>> One of the difficulties here is:
>>> 1. i don't want to commit to the internal implementation of Futures
>>
>> Yep, that's why to avoid requiring them to have direct access to the
>> internal variables.
>>
>>> 2. it might be hard to make it clear which methods are public to
>>> users and
>>> which methods are public to executor implementors
>>
>> One way to do it would be to create another type for implementors and
>> pass it to the Future constructor.
>
> If we change the future interface like so:
>
> class Future(object):
> # Existing public methods
> ...
> # For executors only
> def set_result(self):
> ...
> def set_exception(self):
> ...
> def check_cancel_and_notify(self):
> # returns True if the Future was cancelled and
> # notifies anyone who cares i.e. waiters for
> # wait() and as_completed
>
>
> Then an executor implementor need only implement:
>
> def submit(self, fn, *args, **kwargs):
>
> With the logic to actual execute fn(*args, **kwargs) and update the
> returned future, of course.
>
> Thoughts?
>
> Could you write up the submit() implementations you're thinking of?
> That kind of interface extension seems right.
I mean that submit will implement all of the application-specific
logic and call the above methods as it processes the future. I added a
note (but not much in the way of details) about that.
>
>>>> * Could you specify in what circumstances a pure computational
>>>> Future-based program may deadlock? (Ideally, that would be
>>>> "never".)
>>>> Your current implementation includes two such deadlocks, for which
>>>> I've attached a test.
>>>
>>> Thanks for the tests but I wasn't planning on changing this
>>> behavior. I
>>> don't really like the idea of using the calling thread to perform
>>> the wait
>>> because:
>>> 1. not all executors will be able to implement that behavior
>>
>> Why not?
>
> What if my executor sends the data to a remove cluster for execution
> and running it locally isn't feasible?
>
> If the executor can't send itself across the network, you're fine
> since it'll be impossible to create cycles. If the executor can add
> threads dynamically when it notices that it's not using enough of
> the CPU, it's also fine since you remove the limited resource. If
> the executor can be copied and cannot add threads, then it sent the
> data one way somehow, so it should be able to send the data the
> other way to execute locally. It _is_ possible to run out of memory
> or stack space. Is that what you're worried about?
>
>> Thread pools can implement it,
>
> Do you have a strategy in mind that would let you detect arbitrary
> deadlocks in threaded futures?
>
> Yes, AFAIK work stealing suffices for systems made up only of
> futures and executors. Non-future blocking objects can reintroduce
> deadlocks, but I believe futures alone can't.
How would work stealing help with this sort of deadlock?
import time
def wait_on_b():
time.sleep(5)
print(b.result())
return 5
def wait_on_a():
time.sleep(5)
print(a.result())
return 6
f = ThreadPoolExecutor(max_workers=2)
a = f.submit(wait_on_b)
b = f.submit(wait_on_a)
In any case, I've updated the docs and PEP to indicate that deadlocks
are possible.
Cheers,
Brian
>
> Cheers,
> Brian
>
>> and process pools make it
>> impossible to create cycles, so they also can't deadlock.
>>
>>> 2. it can only be made to work if no wait time is specified
>>
>> With a wait time, you have to avoid stealing work, but it's also
>> guaranteed not to deadlock, so it's fine.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/stdlib-sig/attachments/20100225/a32206e3/attachment-0001.html>
More information about the stdlib-sig
mailing list