Re: [stdlib-sig] futures - a new package for asynchronous execution
Where's the current version of the PEP? On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
Several comments:
* I see you using the Executors as context managers, but no mention in the specification about what that does.
I can't see such documentation for built-in Python objects. To be symmetrical with the built-in file object, i've documented the context manager behavior as part of the Executor.shutdown method.
For locks, it has its own section: http://docs.python.org/library/threading.html#using-locks-conditions-and-sem... But I don't care too much about the formatting as long as the PEP specifies it clearly.
You need to specify it. (Your current implementation doesn't wait in __exit__, which I think is the opposite of what you agreed with Antoine, but you can fix that after we get general agreement on the interface.)
Fixed.
* I'd like users to be able to write Executors besides the simple ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable that, could you document what the subclassing interface for Executor looks like? that is, what code do user-written Executors need to include?
I can do that.
I don't think it should include direct access to future._state like ThreadPoolExecutor uses, if at all possible.
Would it be reasonable to make Future an ABC, make a _Future that subclasses it for internal usage and let other Executor subclasses define their own Futures.
What interface are you proposing for the Future ABC? It'll need to support wait() and as_completed() from non-library Futures. I wouldn't mind making the type just a duck-type (it probably wouldn't even need an ABC), although I'd like to give people trying to implement their own Executors as much help as possible. I'd assumed that giving Future some public hooks would be easier than fixing the wait() interface, but I could be wrong.
* Could you specify in what circumstances a pure computational Future-based program may deadlock? (Ideally, that would be "never".) Your current implementation includes two such deadlocks, for which I've attached a test.
* Do you want to make calling Executor.shutdown(wait=True) from within the same Executor 1) detect the problem and raise an exception, 2) deadlock, 3) unspecified behavior, or 4) wait for all other threads and then let the current one continue?
What about a note saying that using any futures functions or methods from inside a scheduled call is likely to lead to deadlock unless care is taken?
Jesse pointed out that one of the first things people try to do when using concurrency libraries is to try to use them inside themselves. I've also tried to use a futures library that forbade nested use ('cause I wrote it), and it was a real pain. It should be easy enough to detect that the caller of Executor.shutdown is one of the Executor's threads or processes, but I wouldn't mind making the obviously incorrect "wait for my own completion" deadlock or throw an exception, and it would make sense to give Executor implementors their choice of which to do.
* This is a nit, but I think that the parameter names for ThreadPoolExecutor and ProcessPoolExecutor should be the same so people can parametrize their code on those constructors. Right now they're "max_threads" and "max_processes", respectively. I might suggest "max_workers".
I'm not sure that I like that. In general consolidating the constructors for executors is not going to be possible.
In general, yes, but in this case they're the same, and we should try to avoid gratuitous differences.
* You should document the exception that happens when you try to pass a ProcessPoolExecutor as an argument to a task executing inside another ProcessPoolExecutor, or make it not throw an exception and document that.
The ProcessPoolExecutor limitations are the same as the multiprocessing limitations. I can provide a note about that and a link to that module's documentation.
And multiprocessing doesn't document that its Pool requires picklability and isn't picklable itself. Saying that the ProcessPoolExecutor is equivalent to a multiprocessing.Pool should be enough for your PEP.
* If it's intentional, you should probably document that if one element of a map() times out, there's no way to come back and wait longer to retrieve it or later elements.
That's not obvious?
Maybe.
* You still mention run_to_futures, run_to_results, and FutureList, even though they're no longer proposed.
Done.
* wait() should probably return a named_tuple or an object so we don't have people writing the unreadable "wait(fs)[0]".
Done.
* Instead of "call finishes" in the description of the return_when parameter, you might describe the behavior in terms of futures becoming done since that's the accessor function you're using.
Done.
* Is RETURN_IMMEDIATELY just a way to categorize futures into done and not? Is that useful over [f for f in fs if f.done()]?
That was an artifact of the previous implementation; removed.
* After shutdown, is RuntimeError the right exception, or should there be a more specific exception?
RunTimeError is what is raised in similar situations by threading e.g. when starting an already started thread.
Ok, works for me. On Sun, Feb 21, 2010 at 5:49 AM, Brian Quinlan <brian@sweetapp.com> wrote:
A few extra points.
On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
* I'd like users to be able to write Executors besides the simple ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable that, could you document what the subclassing interface for Executor looks like? that is, what code do user-written Executors need to include? I don't think it should include direct access to future._state like ThreadPoolExecutor uses, if at all possible.
One of the difficulties here is: 1. i don't want to commit to the internal implementation of Futures
Yep, that's why to avoid requiring them to have direct access to the internal variables.
2. it might be hard to make it clear which methods are public to users and which methods are public to executor implementors
One way to do it would be to create another type for implementors and pass it to the Future constructor.
* Could you specify in what circumstances a pure computational Future-based program may deadlock? (Ideally, that would be "never".) Your current implementation includes two such deadlocks, for which I've attached a test.
Thanks for the tests but I wasn't planning on changing this behavior. I don't really like the idea of using the calling thread to perform the wait because: 1. not all executors will be able to implement that behavior
Why not? Thread pools can implement it, and process pools make it impossible to create cycles, so they also can't deadlock.
2. it can only be made to work if no wait time is specified
With a wait time, you have to avoid stealing work, but it's also guaranteed not to deadlock, so it's fine.
On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
Where's the current version of the PEP?
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.t...
On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
Several comments:
* I see you using the Executors as context managers, but no mention in the specification about what that does.
I can't see such documentation for built-in Python objects. To be symmetrical with the built-in file object, i've documented the context manager behavior as part of the Executor.shutdown method.
For locks, it has its own section: http://docs.python.org/library/threading.html#using-locks-conditions-and-sem... But I don't care too much about the formatting as long as the PEP specifies it clearly.
Added.
You need to specify it. (Your current implementation doesn't wait in __exit__, which I think is the opposite of what you agreed with Antoine, but you can fix that after we get general agreement on the interface.)
Fixed.
* I'd like users to be able to write Executors besides the simple ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable that, could you document what the subclassing interface for Executor looks like? that is, what code do user-written Executors need to include?
I can do that.
I don't think it should include direct access to future._state like ThreadPoolExecutor uses, if at all possible.
Would it be reasonable to make Future an ABC, make a _Future that subclasses it for internal usage and let other Executor subclasses define their own Futures.
What interface are you proposing for the Future ABC? It'll need to support wait() and as_completed() from non-library Futures. I wouldn't mind making the type just a duck-type (it probably wouldn't even need an ABC), although I'd like to give people trying to implement their own Executors as much help as possible. I'd assumed that giving Future some public hooks would be easier than fixing the wait() interface, but I could be wrong.
See below.
* Could you specify in what circumstances a pure computational Future-based program may deadlock? (Ideally, that would be "never".) Your current implementation includes two such deadlocks, for which I've attached a test.
* Do you want to make calling Executor.shutdown(wait=True) from within the same Executor 1) detect the problem and raise an exception, 2) deadlock, 3) unspecified behavior, or 4) wait for all other threads and then let the current one continue?
What about a note saying that using any futures functions or methods from inside a scheduled call is likely to lead to deadlock unless care is taken?
Jesse pointed out that one of the first things people try to do when using concurrency libraries is to try to use them inside themselves. I've also tried to use a futures library that forbade nested use ('cause I wrote it), and it was a real pain.
You can use the API from within Executor-invoked functions - you just have to be careful.
It should be easy enough to detect that the caller of Executor.shutdown is one of the Executor's threads or processes, but I wouldn't mind making the obviously incorrect "wait for my own completion" deadlock or throw an exception, and it would make sense to give Executor implementors their choice of which to do.
* This is a nit, but I think that the parameter names for ThreadPoolExecutor and ProcessPoolExecutor should be the same so people can parametrize their code on those constructors. Right now they're "max_threads" and "max_processes", respectively. I might suggest "max_workers".
I'm not sure that I like that. In general consolidating the constructors for executors is not going to be possible.
In general, yes, but in this case they're the same, and we should try to avoid gratuitous differences.
num_threads and num_processes is more explicit than num_workers but I don't really care so I changed it.
* You should document the exception that happens when you try to pass a ProcessPoolExecutor as an argument to a task executing inside another ProcessPoolExecutor, or make it not throw an exception and document that.
The ProcessPoolExecutor limitations are the same as the multiprocessing limitations. I can provide a note about that and a link to that module's documentation.
And multiprocessing doesn't document that its Pool requires picklability and isn't picklable itself. Saying that the ProcessPoolExecutor is equivalent to a multiprocessing.Pool should be enough for your PEP.
Done.
* If it's intentional, you should probably document that if one element of a map() times out, there's no way to come back and wait longer to retrieve it or later elements.
That's not obvious?
Maybe.
* You still mention run_to_futures, run_to_results, and FutureList, even though they're no longer proposed.
Done.
* wait() should probably return a named_tuple or an object so we don't have people writing the unreadable "wait(fs)[0]".
Done.
* Instead of "call finishes" in the description of the return_when parameter, you might describe the behavior in terms of futures becoming done since that's the accessor function you're using.
Done.
* Is RETURN_IMMEDIATELY just a way to categorize futures into done and not? Is that useful over [f for f in fs if f.done()]?
That was an artifact of the previous implementation; removed.
* After shutdown, is RuntimeError the right exception, or should there be a more specific exception?
RunTimeError is what is raised in similar situations by threading e.g. when starting an already started thread.
Ok, works for me.
On Sun, Feb 21, 2010 at 5:49 AM, Brian Quinlan <brian@sweetapp.com> wrote:
A few extra points.
On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
* I'd like users to be able to write Executors besides the simple ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable that, could you document what the subclassing interface for Executor looks like? that is, what code do user-written Executors need to include? I don't think it should include direct access to future._state like ThreadPoolExecutor uses, if at all possible.
One of the difficulties here is: 1. i don't want to commit to the internal implementation of Futures
Yep, that's why to avoid requiring them to have direct access to the internal variables.
2. it might be hard to make it clear which methods are public to users and which methods are public to executor implementors
One way to do it would be to create another type for implementors and pass it to the Future constructor.
If we change the future interface like so: class Future(object): # Existing public methods ... # For executors only def set_result(self): ... def set_exception(self): ... def check_cancel_and_notify(self): # returns True if the Future was cancelled and # notifies anyone who cares i.e. waiters for # wait() and as_completed Then an executor implementor need only implement: def submit(self, fn, *args, **kwargs): With the logic to actual execute fn(*args, **kwargs) and update the returned future, of course. Thoughts?
* Could you specify in what circumstances a pure computational Future-based program may deadlock? (Ideally, that would be "never".) Your current implementation includes two such deadlocks, for which I've attached a test.
Thanks for the tests but I wasn't planning on changing this behavior. I don't really like the idea of using the calling thread to perform the wait because: 1. not all executors will be able to implement that behavior
Why not?
What if my executor sends the data to a remove cluster for execution and running it locally isn't feasible?
Thread pools can implement it,
Do you have a strategy in mind that would let you detect arbitrary deadlocks in threaded futures? Cheers, Brian
and process pools make it impossible to create cycles, so they also can't deadlock.
2. it can only be made to work if no wait time is specified
With a wait time, you have to avoid stealing work, but it's also guaranteed not to deadlock, so it's fine.
On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
Where's the current version of the PEP?
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.t...
On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
* I'd like users to be able to write Executors besides the simple
ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable
that, could you document what the subclassing interface for Executor
looks like? that is, what code do user-written Executors need to
include?
I can do that.
I don't think it should include direct access to
future._state like ThreadPoolExecutor uses, if at all possible.
Would it be reasonable to make Future an ABC, make a _Future that subclasses
it for internal usage and let other Executor subclasses define their own
Futures.
What interface are you proposing for the Future ABC? It'll need to support wait() and as_completed() from non-library Futures. I wouldn't mind making the type just a duck-type (it probably wouldn't even need an ABC), although I'd like to give people trying to implement their own Executors as much help as possible. I'd assumed that giving Future some public hooks would be easier than fixing the wait() interface, but I could be wrong.
See below.
* Could you specify in what circumstances a pure computational
Future-based program may deadlock? (Ideally, that would be "never".)
Your current implementation includes two such deadlocks, for which
I've attached a test.
* Do you want to make calling Executor.shutdown(wait=True) from within
the same Executor 1) detect the problem and raise an exception, 2)
deadlock, 3) unspecified behavior, or 4) wait for all other threads
and then let the current one continue?
What about a note saying that using any futures functions or methods from
inside a scheduled call is likely to lead to deadlock unless care is taken?
Jesse pointed out that one of the first things people try to do when using concurrency libraries is to try to use them inside themselves. I've also tried to use a futures library that forbade nested use ('cause I wrote it), and it was a real pain.
You can use the API from within Executor-invoked functions - you just have to be careful.
It's the job of the PEP (and later the docs) to explain exactly what care is needed. Or were you asking if I was ok with adding that explanation to the PEP? I think that explanation is the minimum requirement (that's what I meant by "Could you specify in what circumstances a pure computational Future-based program may deadlock?"), but it would be better if it could never deadlock, which is achievable by stealing work.
It should be easy enough to detect that the caller of Executor.shutdown is one of the Executor's threads or processes, but I wouldn't mind making the obviously incorrect "wait for my own completion" deadlock or throw an exception, and it would make sense to give Executor implementors their choice of which to do.
* This is a nit, but I think that the parameter names for
ThreadPoolExecutor and ProcessPoolExecutor should be the same so
people can parametrize their code on those constructors. Right now
they're "max_threads" and "max_processes", respectively. I might
suggest "max_workers".
I'm not sure that I like that. In general consolidating the constructors for
executors is not going to be possible.
In general, yes, but in this case they're the same, and we should try to avoid gratuitous differences.
num_threads and num_processes is more explicit than num_workers but I don't really care so I changed it.
Thanks.
* I'd like users to be able to write Executors besides the simple
ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable
that, could you document what the subclassing interface for Executor
looks like? that is, what code do user-written Executors need to
include? I don't think it should include direct access to
future._state like ThreadPoolExecutor uses, if at all possible.
One of the difficulties here is:
1. i don't want to commit to the internal implementation of Futures
Yep, that's why to avoid requiring them to have direct access to the internal variables.
2. it might be hard to make it clear which methods are public to users and
which methods are public to executor implementors
One way to do it would be to create another type for implementors and pass it to the Future constructor.
If we change the future interface like so:
class Future(object): # Existing public methods ... # For executors only def set_result(self): ... def set_exception(self): ... def check_cancel_and_notify(self): # returns True if the Future was cancelled and # notifies anyone who cares i.e. waiters for # wait() and as_completed
Then an executor implementor need only implement:
def submit(self, fn, *args, **kwargs):
With the logic to actual execute fn(*args, **kwargs) and update the returned future, of course.
Thoughts?
Could you write up the submit() implementations you're thinking of? That
kind of interface extension seems right.
* Could you specify in what circumstances a pure computational
Future-based program may deadlock? (Ideally, that would be "never".)
Your current implementation includes two such deadlocks, for which
I've attached a test.
Thanks for the tests but I wasn't planning on changing this behavior. I
don't really like the idea of using the calling thread to perform the wait
because:
1. not all executors will be able to implement that behavior
Why not?
What if my executor sends the data to a remove cluster for execution and running it locally isn't feasible?
If the executor can't send itself across the network, you're fine since it'll be impossible to create cycles. If the executor can add threads dynamically when it notices that it's not using enough of the CPU, it's also fine since you remove the limited resource. If the executor can be copied and cannot add threads, then it sent the data one way somehow, so it should be able to send the data the other way to execute locally. It _is_ possible to run out of memory or stack space. Is that what you're worried about? Thread pools can implement it,
Do you have a strategy in mind that would let you detect arbitrary deadlocks in threaded futures?
Yes, AFAIK work stealing suffices for systems made up only of futures and executors. Non-future blocking objects can reintroduce deadlocks, but I believe futures alone can't.
Cheers, Brian
and process pools make it impossible to create cycles, so they also can't deadlock.
2. it can only be made to work if no wait time is specified
With a wait time, you have to avoid stealing work, but it's also guaranteed not to deadlock, so it's fine.
On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
Where's the current version of the PEP?
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.t...
Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/ -- --Guido van Rossum (python.org/~guido)
On Feb 23, 2010, at 5:00 PM, Guido van Rossum wrote:
On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
Where's the current version of the PEP?
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.t...
Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/
I get a 404 on that URL. S
On Tue, Feb 23, 2010 at 14:13, ssteinerX@gmail.com <ssteinerx@gmail.com>wrote:
On Feb 23, 2010, at 5:00 PM, Guido van Rossum wrote:
On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
Where's the current version of the PEP?
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.t...
Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/
I get a 404 on that URL.
It's because one of the PEPs has become improperly encoded; you can run 'make' in a PEPs checkout to trigger the error. -Brett
S
_______________________________________________ stdlib-sig mailing list stdlib-sig@python.org http://mail.python.org/mailman/listinfo/stdlib-sig
On Tue, Feb 23, 2010 at 2:36 PM, Brett Cannon <brett@python.org> wrote:
On Tue, Feb 23, 2010 at 14:13, ssteinerX@gmail.com <ssteinerx@gmail.com> wrote:
On Feb 23, 2010, at 5:00 PM, Guido van Rossum wrote:
On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
Where's the current version of the PEP?
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.t...
Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/
I get a 404 on that URL.
It's because one of the PEPs has become improperly encoded; you can run 'make' in a PEPs checkout to trigger the error.
Eh, sorry! Fixed now. -- --Guido van Rossum (python.org/~guido)
The PEP officially lives at: http://python.org/dev/peps/pep-3148 but this version is the most up-to-date: http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pep-3... On Feb 24, 2010, at 7:04 AM, Jeffrey Yasskin wrote:
On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
Where's the current version of the PEP?
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.t...
On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
* I'd like users to be able to write Executors besides the simple ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable that, could you document what the subclassing interface for Executor looks like? that is, what code do user-written Executors need to include?
I can do that.
I don't think it should include direct access to future._state like ThreadPoolExecutor uses, if at all possible.
Would it be reasonable to make Future an ABC, make a _Future that subclasses it for internal usage and let other Executor subclasses define their own Futures.
What interface are you proposing for the Future ABC? It'll need to support wait() and as_completed() from non-library Futures. I wouldn't mind making the type just a duck-type (it probably wouldn't even need an ABC), although I'd like to give people trying to implement their own Executors as much help as possible. I'd assumed that giving Future some public hooks would be easier than fixing the wait() interface, but I could be wrong.
See below.
* Could you specify in what circumstances a pure computational Future-based program may deadlock? (Ideally, that would be "never".) Your current implementation includes two such deadlocks, for which I've attached a test.
* Do you want to make calling Executor.shutdown(wait=True) from within the same Executor 1) detect the problem and raise an exception, 2) deadlock, 3) unspecified behavior, or 4) wait for all other threads and then let the current one continue?
What about a note saying that using any futures functions or methods from inside a scheduled call is likely to lead to deadlock unless care is taken?
Jesse pointed out that one of the first things people try to do when using concurrency libraries is to try to use them inside themselves. I've also tried to use a futures library that forbade nested use ('cause I wrote it), and it was a real pain.
You can use the API from within Executor-invoked functions - you just have to be careful.
It's the job of the PEP (and later the docs) to explain exactly what care is needed. Or were you asking if I was ok with adding that explanation to the PEP? I think that explanation is the minimum requirement (that's what I meant by "Could you specify in what circumstances a pure computational Future-based program may deadlock?"), but it would be better if it could never deadlock, which is achievable by stealing work.
I don't think so, see below.
It should be easy enough to detect that the caller of Executor.shutdown is one of the Executor's threads or processes, but I wouldn't mind making the obviously incorrect "wait for my own completion" deadlock or throw an exception, and it would make sense to give Executor implementors their choice of which to do.
* This is a nit, but I think that the parameter names for ThreadPoolExecutor and ProcessPoolExecutor should be the same so people can parametrize their code on those constructors. Right now they're "max_threads" and "max_processes", respectively. I might suggest "max_workers".
I'm not sure that I like that. In general consolidating the constructors for executors is not going to be possible.
In general, yes, but in this case they're the same, and we should try to avoid gratuitous differences.
num_threads and num_processes is more explicit than num_workers but I don't really care so I changed it.
Thanks.
* I'd like users to be able to write Executors besides the simple ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable that, could you document what the subclassing interface for Executor looks like? that is, what code do user-written Executors need to include? I don't think it should include direct access to future._state like ThreadPoolExecutor uses, if at all possible.
One of the difficulties here is: 1. i don't want to commit to the internal implementation of Futures
Yep, that's why to avoid requiring them to have direct access to the internal variables.
2. it might be hard to make it clear which methods are public to users and which methods are public to executor implementors
One way to do it would be to create another type for implementors and pass it to the Future constructor.
If we change the future interface like so:
class Future(object): # Existing public methods ... # For executors only def set_result(self): ... def set_exception(self): ... def check_cancel_and_notify(self): # returns True if the Future was cancelled and # notifies anyone who cares i.e. waiters for # wait() and as_completed
Then an executor implementor need only implement:
def submit(self, fn, *args, **kwargs):
With the logic to actual execute fn(*args, **kwargs) and update the returned future, of course.
Thoughts?
Could you write up the submit() implementations you're thinking of? That kind of interface extension seems right.
I mean that submit will implement all of the application-specific logic and call the above methods as it processes the future. I added a note (but not much in the way of details) about that.
* Could you specify in what circumstances a pure computational Future-based program may deadlock? (Ideally, that would be "never".) Your current implementation includes two such deadlocks, for which I've attached a test.
Thanks for the tests but I wasn't planning on changing this behavior. I don't really like the idea of using the calling thread to perform the wait because: 1. not all executors will be able to implement that behavior
Why not?
What if my executor sends the data to a remove cluster for execution and running it locally isn't feasible?
If the executor can't send itself across the network, you're fine since it'll be impossible to create cycles. If the executor can add threads dynamically when it notices that it's not using enough of the CPU, it's also fine since you remove the limited resource. If the executor can be copied and cannot add threads, then it sent the data one way somehow, so it should be able to send the data the other way to execute locally. It _is_ possible to run out of memory or stack space. Is that what you're worried about?
Thread pools can implement it,
Do you have a strategy in mind that would let you detect arbitrary deadlocks in threaded futures?
Yes, AFAIK work stealing suffices for systems made up only of futures and executors. Non-future blocking objects can reintroduce deadlocks, but I believe futures alone can't.
How would work stealing help with this sort of deadlock? import time def wait_on_b(): time.sleep(5) print(b.result()) return 5 def wait_on_a(): time.sleep(5) print(a.result()) return 6 f = ThreadPoolExecutor(max_workers=2) a = f.submit(wait_on_b) b = f.submit(wait_on_a) In any case, I've updated the docs and PEP to indicate that deadlocks are possible. Cheers, Brian
Cheers, Brian
and process pools make it impossible to create cycles, so they also can't deadlock.
2. it can only be made to work if no wait time is specified
With a wait time, you have to avoid stealing work, but it's also guaranteed not to deadlock, so it's fine.
On Thu, Feb 25, 2010 at 1:33 AM, Brian Quinlan <brian@sweetapp.com> wrote:
The PEP officially lives at: http://python.org/dev/peps/pep-3148
but this version is the most up-to-date:
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pep-3...
On Feb 24, 2010, at 7:04 AM, Jeffrey Yasskin wrote:
On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
* Could you specify in what circumstances a pure computational
Future-based program may deadlock? (Ideally, that would be "never".)
Your current implementation includes two such deadlocks, for which
I've attached a test.
* Do you want to make calling Executor.shutdown(wait=True) from within
the same Executor 1) detect the problem and raise an exception, 2)
deadlock, 3) unspecified behavior, or 4) wait for all other threads
and then let the current one continue?
What about a note saying that using any futures functions or methods from
inside a scheduled call is likely to lead to deadlock unless care is taken?
Jesse pointed out that one of the first things people try to do when using concurrency libraries is to try to use them inside themselves. I've also tried to use a futures library that forbade nested use ('cause I wrote it), and it was a real pain.
You can use the API from within Executor-invoked functions - you just have to be careful.
It's the job of the PEP (and later the docs) to explain exactly what care is needed. Or were you asking if I was ok with adding that explanation to the PEP? I think that explanation is the minimum requirement (that's what I meant by "Could you specify in what circumstances a pure computational Future-based program may deadlock?"), but it would be better if it could never deadlock, which is achievable by stealing work.
I don't think so, see below.
It should be easy enough to detect that the caller of
Executor.shutdown is one of the Executor's threads or processes, but I wouldn't mind making the obviously incorrect "wait for my own completion" deadlock or throw an exception, and it would make sense to give Executor implementors their choice of which to do.
* This is a nit, but I think that the parameter names for
ThreadPoolExecutor and ProcessPoolExecutor should be the same so
people can parametrize their code on those constructors. Right now
they're "max_threads" and "max_processes", respectively. I might
suggest "max_workers".
I'm not sure that I like that. In general consolidating the constructors for
executors is not going to be possible.
In general, yes, but in this case they're the same, and we should try to avoid gratuitous differences.
num_threads and num_processes is more explicit than num_workers but I don't really care so I changed it.
Thanks.
* I'd like users to be able to write Executors besides the simple
ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable
that, could you document what the subclassing interface for Executor
looks like? that is, what code do user-written Executors need to
include? I don't think it should include direct access to
future._state like ThreadPoolExecutor uses, if at all possible.
One of the difficulties here is:
1. i don't want to commit to the internal implementation of Futures
Yep, that's why to avoid requiring them to have direct access to the internal variables.
2. it might be hard to make it clear which methods are public to users and
which methods are public to executor implementors
One way to do it would be to create another type for implementors and pass it to the Future constructor.
If we change the future interface like so:
class Future(object): # Existing public methods ... # For executors only def set_result(self): ... def set_exception(self): ... def check_cancel_and_notify(self): # returns True if the Future was cancelled and # notifies anyone who cares i.e. waiters for # wait() and as_completed
Then an executor implementor need only implement:
def submit(self, fn, *args, **kwargs):
With the logic to actual execute fn(*args, **kwargs) and update the returned future, of course.
Thoughts?
Could you write up the submit() implementations you're thinking of? That
kind of interface extension seems right.
I mean that submit will implement all of the application-specific logic and call the above methods as it processes the future. I added a note (but not much in the way of details) about that.
Your process pool still relies on future._condition, but I think you can just delete that line and everything will still work. This seems fine to me. Thanks!
* Could you specify in what circumstances a pure computational
Future-based program may deadlock? (Ideally, that would be "never".)
Your current implementation includes two such deadlocks, for which
I've attached a test.
Thanks for the tests but I wasn't planning on changing this behavior. I
don't really like the idea of using the calling thread to perform the wait
because:
1. not all executors will be able to implement that behavior
Why not?
What if my executor sends the data to a remove cluster for execution and running it locally isn't feasible?
If the executor can't send itself across the network, you're fine since it'll be impossible to create cycles. If the executor can add threads dynamically when it notices that it's not using enough of the CPU, it's also fine since you remove the limited resource. If the executor can be copied and cannot add threads, then it sent the data one way somehow, so it should be able to send the data the other way to execute locally. It _is_ possible to run out of memory or stack space. Is that what you're worried about?
Thread pools can implement it,
Do you have a strategy in mind that would let you detect arbitrary deadlocks in threaded futures?
Yes, AFAIK work stealing suffices for systems made up only of futures and executors. Non-future blocking objects can reintroduce deadlocks, but I believe futures alone can't.
How would work stealing help with this sort of deadlock?
import time def wait_on_b(): time.sleep(5) print(b.result()) return 5
def wait_on_a(): time.sleep(5) print(a.result()) return 6
f = ThreadPoolExecutor(max_workers=2) a = f.submit(wait_on_b) b = f.submit(wait_on_a)
Heh. If you're going to put that in the pep, at least make it correct (sleeping is not synchronization): import threading condition = threading.Condition(threading.Lock()) a = None b = None def wait_on_b(): with condition: while b is None: condition.wait() print(b.result()) return 5 def wait_on_a(): with condition: while a is None: condition.wait() print(a.result()) return 6 f = ThreadPoolExecutor(max_workers=2) with condition: a = f.submit(wait_on_b) b = f.submit(wait_on_a) condition.notifyAll()
In any case, I've updated the docs and PEP to indicate that deadlocks are possible.
Thanks. I still disagree, and think users are much more likely to be surprised by occasional deadlocks due to cycles of executors than they are about guaranteed deadlocks from cycles of futures, but I don't want to be the only one holding up the PEP by insisting on this. I think there are places the names could be improved, and Jesse probably has an opinion on exactly where this should go in the package hierarchy, but I think it will make a good addition to the standard library. Thanks for working on it! Jeffrey
On Thu, Feb 25, 2010 at 12:27 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote: ... snip
In any case, I've updated the docs and PEP to indicate that deadlocks are possible.
Thanks. I still disagree, and think users are much more likely to be surprised by occasional deadlocks due to cycles of executors than they are about guaranteed deadlocks from cycles of futures, but I don't want to be the only one holding up the PEP by insisting on this. I think there are places the names could be improved, and Jesse probably has an opinion on exactly where this should go in the package hierarchy, but I think it will make a good addition to the standard library. Thanks for working on it! Jeffrey
Yes; I think this needs to be part of a new "concurrent" package in the stdlib - e.g. concurrent.futures, understanding things within multiprocessing will be put in there shortly, and possibly other things such as a threadpool and other common sugary abstractions. jesse
On Feb 26, 2010, at 5:49 AM, Jesse Noller wrote:
On Thu, Feb 25, 2010 at 12:27 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote: ... snip
In any case, I've updated the docs and PEP to indicate that deadlocks are possible.
Thanks. I still disagree, and think users are much more likely to be surprised by occasional deadlocks due to cycles of executors than they are about guaranteed deadlocks from cycles of futures, but I don't want to be the only one holding up the PEP by insisting on this. I think there are places the names could be improved, and Jesse probably has an opinion on exactly where this should go in the package hierarchy, but I think it will make a good addition to the standard library. Thanks for working on it! Jeffrey
Yes; I think this needs to be part of a new "concurrent" package in the stdlib - e.g. concurrent.futures, understanding things within multiprocessing will be put in there shortly, and possibly other things such as a threadpool and other common sugary abstractions.
Are you imagining that futures would be subpackage of concurrent with a single logical namespace i.e. concurrent/ __init__.py futures/ __init__.py threads.py processes.py ... from concurrent.future import wait from concurrent.future import ThreadPoolExecutor Or should the futures package be merged into the concurrent package i.e. concurrent/ __init__.py futures.py threadpoolexecutor.py (was threads.py) processpoolexecutor.py (as processes.py) from concurrent.future import wait from concurrent.future.threadpoolexecutor import ThreadPoolExecutor ? Cheers, Brian
On Thu, Feb 25, 2010 at 10:28 PM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 26, 2010, at 5:49 AM, Jesse Noller wrote:
...
Yes; I think this needs to be part of a new "concurrent" package in the stdlib - e.g. concurrent.futures, understanding things within multiprocessing will be put in there shortly, and possibly other things such as a threadpool and other common sugary abstractions.
Are you imagining that futures would be subpackage of concurrent with a single logical namespace i.e.
concurrent/ __init__.py futures/ __init__.py threads.py processes.py ...
from concurrent.future import wait from concurrent.future import ThreadPoolExecutor
Or should the futures package be merged into the concurrent package i.e.
concurrent/ __init__.py futures.py threadpoolexecutor.py (was threads.py) processpoolexecutor.py (as processes.py)
from concurrent.future import wait from concurrent.future.threadpoolexecutor import ThreadPoolExecutor
I'm on the fence. I took a few minutes to think about this today, and my gut says concurrent with a single logical namespace - so: from concurrent import futures futures.ThreadPoolExecutor And so on. Others might balk at a deeper namespace, but then say we add: concurrent/ futures/ pool.py (allows for a process pool, or threadpool) managers.py And so on. I'm trying to mentally organize things to "be like" java.util.concurrent [1] - ideally we could move/consolidate the common sugar into this package, and remove the other "stuff" from multiprocessing as well. That way multiprocessing can become "just" Process and the locking stuff, ala threading, and the rest of the other nice-things can be made to work with threads *and* processes ala what you've done with futures. This is just a thought; I've been thinking about it a lot, but I admit not having sat down and itemized the things that would live in this new home. The futures discussion just spurred me to propose the idea sooner rather than later. Jesse [1] http://java.sun.com/javase/6/docs/api/java/util/concurrent/package-summary.h...
On Thu, Feb 25, 2010 at 7:54 PM, Jesse Noller <jnoller@gmail.com> wrote:
I'm on the fence. I took a few minutes to think about this today, and my gut says concurrent with a single logical namespace - so:
from concurrent import futures futures.ThreadPoolExecutor
And so on. Others might balk at a deeper namespace, but then say we add:
concurrent/ futures/ pool.py (allows for a process pool, or threadpool) managers.py
And so on. I'm trying to mentally organize things to "be like" java.util.concurrent [1] - ideally we could move/consolidate the common sugar into this package, and remove the other "stuff" from multiprocessing as well. That way multiprocessing can become "just" Process and the locking stuff, ala threading, and the rest of the other nice-things can be made to work with threads *and* processes ala what you've done with futures.
My gut agrees, FWIW.
Hey people, could you strip some quoting when you are replying to each other's e-mails? It would make following the discussion much easier :) Regards Antoine. Le Thu, 25 Feb 2010 09:27:14 -0800, Jeffrey Yasskin <jyasskin@gmail.com> a écrit :
On Thu, Feb 25, 2010 at 1:33 AM, Brian Quinlan <brian@sweetapp.com> wrote:
[snip]
On Feb 26, 2010, at 4:27 AM, Jeffrey Yasskin wrote:
On Thu, Feb 25, 2010 at 1:33 AM, Brian Quinlan <brian@sweetapp.com> wrote: Your process pool still relies on future._condition, but I think you can just delete that line and everything will still work. This seems fine to me. Thanks!
Oops. Fixed. Thanks.
Heh. If you're going to put that in the pep, at least make it correct (sleeping is not synchronization):
I can't tell if you are joking or not. Was my demonstration of a possible deadlock scenario really unclear?
import threading condition = threading.Condition(threading.Lock()) a = None b = None
def wait_on_b(): with condition: while b is None: condition.wait() print(b.result()) return 5
def wait_on_a(): with condition: while a is None: condition.wait() print(a.result()) return 6
f = ThreadPoolExecutor(max_workers=2) with condition: a = f.submit(wait_on_b) b = f.submit(wait_on_a) condition.notifyAll()
In any case, I've updated the docs and PEP to indicate that deadlocks are possible.
Thanks. I still disagree, and think users are much more likely to be surprised by occasional deadlocks due to cycles of executors than they are about guaranteed deadlocks from cycles of futures, but I don't want to be the only one holding up the PEP by insisting on this.
Cycles of futures are not guaranteed to deadlock. Remove the sleeps from my example and it will deadlock a small percentage of the time. Cheers, Brian
I think there are places the names could be improved, and Jesse probably has an opinion on exactly where this should go in the package hierarchy, but I think it will make a good addition to the standard library. Thanks for working on it!
Jeffrey
On Thu, Feb 25, 2010 at 7:10 PM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 26, 2010, at 4:27 AM, Jeffrey Yasskin wrote:
Heh. If you're going to put that in the pep, at least make it correct (sleeping is not synchronization):
I can't tell if you are joking or not. Was my demonstration of a possible deadlock scenario really unclear?
It's clear; it's just wrong code, even if the futures weren't a cycle. Waiting using sleep in any decently-sized system is guaranteed to cause problems. Yes, this example will work nearly every time (although if you get your load high enough, you'll still see NameErrors), but it's not the kind of thing we should be showing users. (For that matter, communicating between futures using globals is also a bad use of them, but it's not outright broken.)
Thanks. I still disagree, and think users are much more likely to be surprised by occasional deadlocks due to cycles of executors than they are about guaranteed deadlocks from cycles of futures, but I don't want to be the only one holding up the PEP by insisting on this.
Cycles of futures are not guaranteed to deadlock. Remove the sleeps from my example and it will deadlock a small percentage of the time.
It only fails to deadlock when it fails to create a cycle of futures. It sounds like Antoine also wants you to either have the threaded futures steal work or detect executor cycles and raise an exception.
On Thu, Feb 25, 2010 at 7:26 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
On Thu, Feb 25, 2010 at 7:10 PM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 26, 2010, at 4:27 AM, Jeffrey Yasskin wrote:
Heh. If you're going to put that in the pep, at least make it correct (sleeping is not synchronization):
I can't tell if you are joking or not. Was my demonstration of a possible deadlock scenario really unclear?
It's clear; it's just wrong code, even if the futures weren't a cycle. Waiting using sleep in any decently-sized system is guaranteed to cause problems. Yes, this example will work nearly every time (although if you get your load high enough, you'll still see NameErrors), but it's not the kind of thing we should be showing users. (For that matter, communicating between futures using globals is also a bad use of them, but it's not outright broken.)
Thanks. I still disagree, and think users are much more likely to be surprised by occasional deadlocks due to cycles of executors than they are about guaranteed deadlocks from cycles of futures, but I don't want to be the only one holding up the PEP by insisting on this.
Cycles of futures are not guaranteed to deadlock. Remove the sleeps from my example and it will deadlock a small percentage of the time.
It only fails to deadlock when it fails to create a cycle of futures.
It sounds like Antoine also wants you to either have the threaded futures steal work or detect executor cycles and raise an exception.
FWIW, the other way to fix these deadlocks is to write a smarter thread pool. If the thread pool can notice that it's not using as many CPUs as it's been told to use, it can start a new thread, which runs the queued task and resolves the deadlock. It's actually a better solution in the long run since it also solves the problem with wait-for-one deadlocking or behaving badly. The problem is that this is surprisingly hard to get right. Measuring current CPU use is tricky and non-portable; if you start new threads too aggressively, you can run out of memory or start thrashing; and if you don't start threads aggressively enough you hurt performance.
*mega snip* Jeffrey/Brian/all - Do you think we are ready to move this to the grist mill of python-dev? Or should we hold off until I get off my rump and do the concurrent.* namespace PEP? jesse
Wow, timing is everything - I sent Guido an e-mail asking the same thing < 30 seconds ago :-) Cheers, Brian On Mar 5, 2010, at 2:08 PM, Jesse Noller wrote:
*mega snip*
Jeffrey/Brian/all - Do you think we are ready to move this to the grist mill of python-dev? Or should we hold off until I get off my rump and do the concurrent.* namespace PEP?
jesse
On Thu, Mar 4, 2010 at 10:09 PM, Brian Quinlan <brian@sweetapp.com> wrote:
Wow, timing is everything - I sent Guido an e-mail asking the same thing < 30 seconds ago :-)
Cheers, Brian
Well, I'd like to make sure Jeffrey's concerns have been addressed. Once he's happy, I'm ok with pushing it towards it's inevitable end. I think the namespacing is secondary to the futures PEP though jesse
And yes, go ahead and bring it up on python-dev. Don't bother with c.l.py unless you are particularly masochistic. --Guido On Thu, Mar 4, 2010 at 7:09 PM, Brian Quinlan <brian@sweetapp.com> wrote:
Wow, timing is everything - I sent Guido an e-mail asking the same thing < 30 seconds ago :-)
Cheers, Brian
On Mar 5, 2010, at 2:08 PM, Jesse Noller wrote:
*mega snip*
Jeffrey/Brian/all - Do you think we are ready to move this to the grist mill of python-dev? Or should we hold off until I get off my rump and do the concurrent.* namespace PEP?
-- --Guido van Rossum (python.org/~guido)
On Thu, Mar 4, 2010 at 11:18 PM, Guido van Rossum <guido@python.org> wrote:
And yes, go ahead and bring it up on python-dev. Don't bother with c.l.py unless you are particularly masochistic.
--Guido
He's proposing a concurrency thingie for python. I think that implies a certain level of masochism already. :)
On Feb 26, 2010, at 2:26 PM, Jeffrey Yasskin wrote:
On Thu, Feb 25, 2010 at 7:10 PM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 26, 2010, at 4:27 AM, Jeffrey Yasskin wrote:
Heh. If you're going to put that in the pep, at least make it correct (sleeping is not synchronization):
I can't tell if you are joking or not. Was my demonstration of a possible deadlock scenario really unclear?
It's clear; it's just wrong code, even if the futures weren't a cycle. Waiting using sleep in any decently-sized system is guaranteed to cause problems. Yes, this example will work nearly every time (although if you get your load high enough, you'll still see NameErrors), but it's not the kind of thing we should be showing users. (For that matter, communicating between futures using globals is also a bad use of them, but it's not outright broken.)
Hey Jeff, I'm trying to demonstrate a pattern of executor usage that is likely to lead to deadlock. If, looking at the example, people are clear that this may lead to deadlock then I don't think that is necessary to write an example that provably always leads to deadlock. In fact, I think that all of the extra locking code required really distracts from the core of the problem being demonstrated.
Thanks. I still disagree, and think users are much more likely to be surprised by occasional deadlocks due to cycles of executors than they are about guaranteed deadlocks from cycles of futures, but I don't want to be the only one holding up the PEP by insisting on this.
Cycles of futures are not guaranteed to deadlock. Remove the sleeps from my example and it will deadlock a small percentage of the time.
It only fails to deadlock when it fails to create a cycle of futures.
It sounds like Antoine also wants you to either have the threaded futures steal work or detect executor cycles and raise an exception.
I really don't like the idea of work stealing. Do you have a concrete proposal on how to detect cycles? Cheers, Brian
I'm not going to be the one jerk holding back this proposal, so go ahead and submit it to python-dev. I'm not around again until Saturday, so I won't get a chance to comment until then. On Thu, Mar 4, 2010 at 8:02 PM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 26, 2010, at 2:26 PM, Jeffrey Yasskin wrote:
On Thu, Feb 25, 2010 at 7:10 PM, Brian Quinlan <brian@sweetapp.com> wrote:
On Feb 26, 2010, at 4:27 AM, Jeffrey Yasskin wrote:
Heh. If you're going to put that in the pep, at least make it correct (sleeping is not synchronization):
I can't tell if you are joking or not. Was my demonstration of a possible deadlock scenario really unclear?
It's clear; it's just wrong code, even if the futures weren't a cycle. Waiting using sleep in any decently-sized system is guaranteed to cause problems. Yes, this example will work nearly every time (although if you get your load high enough, you'll still see NameErrors), but it's not the kind of thing we should be showing users. (For that matter, communicating between futures using globals is also a bad use of them, but it's not outright broken.)
Hey Jeff,
I'm trying to demonstrate a pattern of executor usage that is likely to lead to deadlock.
If, looking at the example, people are clear that this may lead to deadlock then I don't think that is necessary to write an example that provably always leads to deadlock.
In fact, I think that all of the extra locking code required really distracts from the core of the problem being demonstrated.
Thanks. I still disagree, and think users are much more likely to be surprised by occasional deadlocks due to cycles of executors than they are about guaranteed deadlocks from cycles of futures, but I don't want to be the only one holding up the PEP by insisting on this.
Cycles of futures are not guaranteed to deadlock. Remove the sleeps from my example and it will deadlock a small percentage of the time.
It only fails to deadlock when it fails to create a cycle of futures.
It sounds like Antoine also wants you to either have the threaded futures steal work or detect executor cycles and raise an exception.
I really don't like the idea of work stealing.
Do you have a concrete proposal on how to detect cycles?
Cheers, Brian
-- Namasté, Jeffrey Yasskin http://jeffrey.yasskin.info/
Le Thu, 25 Feb 2010 20:33:09 +1100, Brian Quinlan <brian@sweetapp.com> a écrit :
In any case, I've updated the docs and PEP to indicate that deadlocks are possible.
For the record, I think that potential deadlocks simply by using a library function (other than locks themselves) are a bad thing. It would be better if the library either avoided deadlocks, or detected them and raised an exception instead. (admittedly, we already have such an issue with the import lock) Regards Antoine.
participants (7)
-
Antoine Pitrou
-
Brett Cannon
-
Brian Quinlan
-
Guido van Rossum
-
Jeffrey Yasskin
-
Jesse Noller
-
ssteinerX@gmail.com