Re: [Python-Dev] [PEP 3148] futures - execute computations asynchronously
At 10:59 AM 3/7/2010 -0800, Jeffrey Yasskin wrote:
So is it that you just don't like the idea of blocking, and want to stop anything that relies on it from getting into the standard library?
Um, no. As I said before, call it a "parallel task queue" or "parallel task manager" or something to that general effect and I'm on board. It may not be in the Zen of Python, but ISTM that names should generally follow use cases. It is something of a corollary to "one obvious way to do it", in that if you see something whose name matches what you want to do, then it should be obvious that that's the way in question. ;-) The use cases for "parallel task queues", however, are a subset of those for "futures" in the general case. Since the proposed module addresses most of the former but very little of the latter, calling it futures is inappropriate. Specifically, it's: 1. Confusing to people who don't know what futures are (see e.g R.D. Murray's post), and 2. Underpowered for people who expect/want a more fully-featured futures system along the lines of E or Deferreds. It seems that the only people for whom it's an intuitively correct description are people who've only had experience with more limited futures models (like Java's). However, these people should not have a problem understanding the notion of parallel task queueing or task management, so changing the name isn't really a loss for them, and it's a gain for everybody else.
Given the set_result and set_exception methods, it's pretty straightforward to fill in the value of a future from something that isn't purely computational.
Those are described as "internal" methods in the PEP; by contrast, the Deferred equivalents are part of the public API.
Given a way to register "on-done" callbacks with the future, it would be straightforward to wait for a future without blocking, too.
Yes, and with a few more additions besides that one, you might be on the way to an actual competitor for Deferreds. For example: retry support, chaining, logging, API for transparent result processing, coroutine support, co-ordination tools like locks, sempaphores and queues, etc. These are all things you would very likely want or need if you actually wanted to write a program using futures as *your main computational model*, vs. just needing to toss out some parallel tasks in a primarily synchronous program. Of course, Deferreds are indeed overkill if all you're ever going to want is a few parallel tasks, unless you're already skilled in using Twisted or some wrapper for it. So, I totally support having a simple task queue in the stdlib, as there are definitely times I would've used such a thing for a quick script, if it were available. However, I've *also* had use cases for using futures as a computational model, and so that's what I originally thought this PEP was about. After the use cases were clarified, though, it seems to me that *calling* it futures is a bad idea, because it's really just a nice task queuing system. I'm +1 on adding a nice task queuing system, -1 on calling it by any other name. ;-)
On Sun, Mar 7, 2010 at 11:56 AM, P.J. Eby <pje@telecommunity.com> wrote:
At 10:59 AM 3/7/2010 -0800, Jeffrey Yasskin wrote:
So is it that you just don't like the idea of blocking, and want to stop anything that relies on it from getting into the standard library?
Um, no. As I said before, call it a "parallel task queue" or "parallel task manager" or something to that general effect and I'm on board.
It may not be in the Zen of Python, but ISTM that names should generally follow use cases. It is something of a corollary to "one obvious way to do it", in that if you see something whose name matches what you want to do, then it should be obvious that that's the way in question. ;-)
The use cases for "parallel task queues", however, are a subset of those for "futures" in the general case. Since the proposed module addresses most of the former but very little of the latter, calling it futures is inappropriate.
Specifically, it's:
1. Confusing to people who don't know what futures are (see e.g R.D. Murray's post), and
This is not a problem. We will document what we consider a future.
2. Underpowered for people who expect/want a more fully-featured futures system along the lines of E or Deferreds.
This sounds like an underhanded slur towards the PEP.
It seems that the only people for whom it's an intuitively correct description are people who've only had experience with more limited futures models (like Java's). However, these people should not have a problem understanding the notion of parallel task queueing or task management, so changing the name isn't really a loss for them, and it's a gain for everybody else.
I expect that the majority of Python users fall either in camp #1 (never heard of futures, will be happy to learn about what Python calls futures) or camp #3 (have used Java futures). The users of E can be counted on a few hands. Deferreds are used heavily in some Python circles but most Python users (myself included) have at most a very vague idea of them. Also, as you clarify below, Deferreds are so much more powerful that they can't possibly be mistaken for futures (as defined by this PEP). Plus they already have a name.
Given the set_result and set_exception methods, it's pretty straightforward to fill in the value of a future from something that isn't purely computational.
Those are described as "internal" methods in the PEP; by contrast, the Deferred equivalents are part of the public API.
Given a way to register "on-done" callbacks with the future, it would be straightforward to wait for a future without blocking, too.
Yes, and with a few more additions besides that one, you might be on the way to an actual competitor for Deferreds. For example: retry support, chaining, logging, API for transparent result processing, coroutine support, co-ordination tools like locks, sempaphores and queues, etc.
These are all things you would very likely want or need if you actually wanted to write a program using futures as *your main computational model*, vs. just needing to toss out some parallel tasks in a primarily synchronous program.
Of course, Deferreds are indeed overkill if all you're ever going to want is a few parallel tasks, unless you're already skilled in using Twisted or some wrapper for it.
So, I totally support having a simple task queue in the stdlib, as there are definitely times I would've used such a thing for a quick script, if it were available.
However, I've *also* had use cases for using futures as a computational model, and so that's what I originally thought this PEP was about. After the use cases were clarified, though, it seems to me that *calling* it futures is a bad idea, because it's really just a nice task queuing system.
I'm +1 on adding a nice task queuing system, -1 on calling it by any other name. ;-)
So let's focus on the functionality of the task queuing system, and stick to roughly the functionality proposed in the PEP. The name is a non-issue and further discussion ought to be sent to null-dev@python.org. -- --Guido van Rossum (python.org/~guido)
P.J. Eby wrote:
I'm +1 on adding a nice task queuing system, -1 on calling it by any other name. ;-)
As Guido said, let's call the nice task queuing system "futures" and point people wanting a full-power asynchronous process model to Twisted - while the Deferred API may technically be independent of the rest of the framework, you need at least some of the other tools for asynchronous operations to make it really shine. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On 3/8/2010 6:14 AM, Nick Coghlan wrote:
P.J. Eby wrote:
I'm +1 on adding a nice task queuing system, -1 on calling it by any other name. ;-)
As Guido said, let's call the nice task queuing system "futures" and
I was confused by 'futures' also until Philip explained it as task-queue or task-pool, and hence also do not like it. Since the examples in the PEP do *NOT* give example output, it was not clear to me whether execution or the termination thereof is ordered (queue) or not (pool). Looking more close, I gather that the prime results will be printed 'in order' (waiting on each even if others are done) while the url results will be printed 'as available'. Adding 'will print ...' and 'might print ...' outputs would help.
point people wanting a full-power asynchronous process model to Twisted
That could be done in the PEP to clarify its scope. Terry Jan Reedy
>> I'm +1 on adding a nice task queuing system, -1 on calling it by any >> other name. ;-) Nick> As Guido said, let's call the nice task queuing system "futures" Nick> and point people wanting a full-power asynchronous process model Nick> to Twisted Can this module at least be pushed down into a package? I think "concurrent" or "concurrency" were both suggested at one point. Skip
Terry Reedy wrote:
Looking more close, I gather that the prime results will be printed 'in order' (waiting on each even if others are done) while the url results will be printed 'as available'.
Seems to me that if you care about the order of the results, you should be able to just wait for each result separately in the order you want them. Something like task1 = start_task(proc1) task2 = start_task(proc2) task3 = start_task(proc3) result1 = task1.wait_for_result() result2 = task2.wait_for_result() result3 = task3.wait_for_result() This would also be a natural way to write things even if you don't care about the order, but you need all the results before proceeding. You're going to be held up until the longest-running task completes anyway, so it doesn't matter if some of them finish earlier and have to sit around waiting for you to collect the result. -- Greg
On 3/8/2010 4:39 PM, Greg Ewing wrote:
Terry Reedy wrote:
Looking more close, I gather that the prime results will be printed 'in order' (waiting on each even if others are done) while the url results will be printed 'as available'.
Seems to me that if you care about the order of the results, you should be able to just wait for each result separately in the order you want them. Something like
task1 = start_task(proc1) task2 = start_task(proc2) task3 = start_task(proc3) result1 = task1.wait_for_result() result2 = task2.wait_for_result() result3 = task3.wait_for_result()
*If* I understand the first example correctly, this is effectively what the first example does with two loops. But I was hoping for clarification and amplification in the PEP.
This would also be a natural way to write things even if you don't care about the order, but you need all the results before proceeding. You're going to be held up until the longest-running task completes anyway, so it doesn't matter if some of them finish earlier and have to sit around waiting for you to collect the result.
On 9 Mar 2010, at 08:39, Greg Ewing wrote:
Terry Reedy wrote:
Looking more close, I gather that the prime results will be printed 'in order' (waiting on each even if others are done) while the url results will be printed 'as available'.
Seems to me that if you care about the order of the results, you should be able to just wait for each result separately in the order you want them. Something like
task1 = start_task(proc1) task2 = start_task(proc2) task3 = start_task(proc3) result1 = task1.wait_for_result() result2 = task2.wait_for_result() result3 = task3.wait_for_result()
You can write this as: executor = ... future1 = executor.submit(proc1) future2 = executor.submit(proc2) future3 = executor.submit(proc3) result1 = task1.result() result2 = task2.result() result3 = task3.result()
This would also be a natural way to write things even if you don't care about the order, but you need all the results before proceeding. You're going to be held up until the longest-running task completes anyway, so it doesn't matter if some of them finish earlier and have to sit around waiting for you to collect the result.
Often you don't want to continue if there is a failure. In the example that you gave, if "proc3" raises an exception immediately, you still wait for "proc1" and "proc2" to complete even though you will end up discarding their results. Cheers, Brian
skip@pobox.com wrote:
>> I'm +1 on adding a nice task queuing system, -1 on calling it by any >> other name. ;-)
Nick> As Guido said, let's call the nice task queuing system "futures" Nick> and point people wanting a full-power asynchronous process model Nick> to Twisted
Can this module at least be pushed down into a package? I think "concurrent" or "concurrency" were both suggested at one point.
Yep, I believe "concurrent.futures" was picked as the name elsewhere in the thread. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
You may want to consider providing global thread and process executors in the futures module itself. Code which just wants to say "do this in the background" without having to manage the lifecycle of its own executor instance is then free to do so. I've had a lot of experience with a framework that provides this and it is *very* convenient (it's also a good way to avoid deadlocks due to synchronous notification APIs).
This seems like a reasonable idea to me. I take it that the thread/process pool should be unlimited in size. Should every thread/process exit when it finishes its job or should there be a smarter collection strategy? Cheers, Brian
Brian Quinlan wrote:
I take it that the thread/process pool should be unlimited in size. Should every thread/process exit when it finishes its job or should there be a smarter collection strategy?
I'd be inclined to do something slightly smarter, similar to what we do with memory overallocation for mutable containers. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
I've updated the PEP to include: - completion callbacks (for interoperability with Twisted Deferreds) - a pointer to the discussion on stdlig-sig See: http://svn.python.org/view/peps/trunk/pep-3148.txt?r1=78618&r2=80679 Rejected ideas: - Having a registration system for executors Not yet addressed: - where the package should live (someone in a "concurrent" package seems fine) - having global executors with unbounded worker counts as a convenience [1] [1] There are a few issues with global executors that need to be thought thought through i.e. when should workers be created and when should they be terminated. I'd be happy to defer this idea unless someone is passionate about it (in which case it would be great if they'd step in with concrete ideas). Cheers, Brian
participants (7)
-
Brian Quinlan
-
Greg Ewing
-
Guido van Rossum
-
Nick Coghlan
-
P.J. Eby
-
skip@pobox.com
-
Terry Reedy