[stdlib-sig] futures - a new package for asynchronous execution

Thu Feb 25 18:27:14 CET 2010

On Thu, Feb 25, 2010 at 1:33 AM, Brian Quinlan <brian at sweetapp.com> wrote:

> The PEP officially lives at:
> http://python.org/dev/peps/pep-3148
>
> but this version is the most up-to-date:
>
> http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pep-3148.txt
>
>
> On Feb 24, 2010, at 7:04 AM, Jeffrey Yasskin wrote:
>
> On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <brian at sweetapp.com> wrote:
>
>>
>> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
>>
>> * Could you specify in what circumstances a pure computational
>>
>> Future-based program may deadlock? (Ideally, that would be "never".)
>>
>> Your current implementation includes two such deadlocks, for which
>>
>> I've attached a test.
>>
>>
>> * Do you want to make calling Executor.shutdown(wait=True) from within
>>
>> the same Executor 1) detect the problem and raise an exception, 2)
>>
>> deadlock, 3) unspecified behavior, or 4) wait for all other threads
>>
>> and then let the current one continue?
>>
>>
>> What about a note saying that using any futures functions or methods from
>>
>> inside a scheduled call is likely to lead to deadlock unless care is
>> taken?
>>
>>
>> Jesse pointed out that one of the first things people try to do when
>> using concurrency libraries is to try to use them inside themselves.
>> I've also tried to use a futures library that forbade nested use
>> ('cause I wrote it), and it was a real pain.
>>
>>
>> You can use the API from within Executor-invoked functions - you just have
>> to be careful.
>>
>
> It's the job of the PEP (and later the docs) to explain exactly what care
> is needed. Or were you asking if I was ok with adding that explanation to
> the PEP? I think that explanation is the minimum requirement (that's what I
> meant by "Could you specify in what circumstances a pure computational
> Future-based program may deadlock?"), but it would be better if it could
> never deadlock, which is achievable by stealing work.
>
>
> I don't think so, see below.
>
> It should be easy enough to detect that the caller of
>> Executor.shutdown is one of the Executor's threads or processes, but I
>> wouldn't mind making the obviously incorrect "wait for my own
>> completion" deadlock or throw an exception, and it would make sense to
>> give Executor implementors their choice of which to do.
>>
>> * This is a nit, but I think that the parameter names for
>>
>> ThreadPoolExecutor and ProcessPoolExecutor should be the same so
>>
>> people can parametrize their code on those constructors. Right now
>>
>> they're "max_threads" and "max_processes", respectively. I might
>>
>> suggest "max_workers".
>>
>>
>> I'm not sure that I like that. In general consolidating the constructors
>> for
>>
>> executors is not going to be possible.
>>
>>
>> In general, yes, but in this case they're the same, and we should try
>> to avoid gratuitous differences.
>>
>>
>> num_threads and num_processes is more explicit than num_workers but I
>> don't really care so I changed it.
>>
>> Thanks.
>
> * I'd like users to be able to write Executors besides the simple
>>
>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable
>>
>> that, could you document what the subclassing interface for Executor
>>
>> looks like? that is, what code do user-written Executors need to
>>
>> include? I don't think it should include direct access to
>>
>> future._state like ThreadPoolExecutor uses, if at all possible.
>>
>>
>> One of the difficulties here is:
>>
>> 1. i don't want to commit to the internal implementation of Futures
>>
>>
>> Yep, that's why to avoid requiring them to have direct access to the
>> internal variables.
>>
>> 2. it might be hard to make it clear which methods are public to users and
>>
>> which methods are public to executor implementors
>>
>>
>> One way to do it would be to create another type for implementors and
>> pass it to the Future constructor.
>>
>>
>> If we change the future interface like so:
>>
>> class Future(object):
>>   # Existing public methods
>>   ...
>>   # For executors only
>>   def set_result(self):
>>     ...
>>   def set_exception(self):
>>     ...
>>   def check_cancel_and_notify(self):
>>     # returns True if the Future was cancelled and
>>     # notifies anyone who cares i.e. waiters for
>>     # wait() and as_completed
>>
>
>
>>
>>
> Then an executor implementor need only implement:
>>
>> def submit(self, fn, *args, **kwargs):
>>
>> With the logic to actual execute fn(*args, **kwargs) and update the
>> returned future, of course.
>>
>> Thoughts?
>>
>> Could you write up the submit() implementations you're thinking of? That
> kind of interface extension seems right.
>
>
> I mean that submit will implement all of the application-specific logic and
> call the above methods as it processes the future. I added a note (but not
> much in the way of details) about that.
>
> Your process pool still relies on future._condition, but I think you can
just delete that line and everything will still work. This seems fine to me.
Thanks!

>
>
>> * Could you specify in what circumstances a pure computational
>>
>> Future-based program may deadlock? (Ideally, that would be "never".)
>>
>> Your current implementation includes two such deadlocks, for which
>>
>> I've attached a test.
>>
>>
>> Thanks for the tests but I wasn't planning on changing this behavior. I
>>
>> don't really like the idea of using the calling thread to perform the wait
>>
>> because:
>>
>> 1. not all executors will be able to implement that behavior
>>
>>
>> Why not?
>>
>>
>> What if my executor sends the data to a remove cluster for execution and
>> running it locally isn't feasible?
>>
>
> If the executor can't send itself across the network, you're fine since
> it'll be impossible to create cycles. If the executor can add threads
> dynamically when it notices that it's not using enough of the CPU, it's also
> fine since you remove the limited resource. If the executor can be copied
> and cannot add threads, then it sent the data one way somehow, so it should
> be able to send the data the other way to execute locally. It _is_ possible
> to run out of memory or stack space. Is that what you're worried about?
>
> Thread pools can implement it,
>>
>>
>> Do you have a strategy in mind that would let you detect arbitrary
>> deadlocks in threaded futures?
>>
>
> Yes, AFAIK work stealing suffices for systems made up only of futures and
> executors.  Non-future blocking objects can reintroduce deadlocks, but I
> believe futures alone can't.
>
>
> How would work stealing help with this sort of deadlock?
>
> import time
> def wait_on_b():
>   time.sleep(5)
>   print(b.result())
>   return 5
>
> def wait_on_a():
>   time.sleep(5)
>   print(a.result())
>   return 6
>
>
> f = ThreadPoolExecutor(max_workers=2)
> a = f.submit(wait_on_b)
> b = f.submit(wait_on_a)
>
>
Heh. If you're going to put that in the pep, at least make it correct
(sleeping is not synchronization):

import threading
condition = threading.Condition(threading.Lock())
a = None
b = None

def wait_on_b():
  with condition:
    while b is None:
      condition.wait()
  print(b.result())
  return 5

def wait_on_a():
  with condition:
    while a is None:
      condition.wait()
  print(a.result())
  return 6

f = ThreadPoolExecutor(max_workers=2)
with condition:
  a = f.submit(wait_on_b)
  b = f.submit(wait_on_a)
  condition.notifyAll()

> In any case, I've updated the docs and PEP to indicate that deadlocks are
> possible.
>

Thanks. I still disagree, and think users are much more likely to be
surprised by occasional deadlocks due to cycles of executors than they are
about guaranteed deadlocks from cycles of futures, but I don't want to be
the only one holding up the PEP by insisting on this.

I think there are places the names could be improved, and Jesse probably has
an opinion on exactly where this should go in the package hierarchy, but I
think it will make a good addition to the standard library. Thanks for
working on it!

Jeffrey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/stdlib-sig/attachments/20100225/cd4702ab/attachment.html>