Add a "block" option to Executor.submit
I have a script that uploads files to Google Drive. It presently performs the uploads serially, but I want to do the uploads in parallel--with a reasonable number of simultaneous uploads--and see if that improves performance. I think that an Executor is the best way to accomplish this task. The trouble is that there's no reason for my script to continue queuing uploads while all of the Executor's workers are busy. In theory, if the number of files to upload is large enough, trying to queue all of them could cause the process to run out of memory. Even if it didn't run out of memory, it could consume an excessive amount of memory. It might be a good idea to add a "block" option to Executor.submit (https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures...) that allows the caller to block until a worker is free to handle the request. I believe that this would be trivial to implement by passing the "block" option through to the Executor's internal Queue.put call (https://github.com/python/cpython/blob/242c26f53edb965e9808dd918089e664c0223...).
On Sep 4, 2019, at 04:21, Chris Simmons <chris.simmons.0@hotmail.com> wrote: I have seen deployed servers that wrap an Executor with a Semaphore to add this functionality (which is mildly silly, but not when the “better” alternative is to subclass the Executor and use knowledge of its implementation intervals…). Which implies that this feature would be helpful in real life code. But not quite as described:
It might be a good idea to add a "block" option to Executor.submit (https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures...) that allows the caller to block until a worker is free to handle the request. I believe that this would be trivial to implement by passing the "block" option through to the Executor's internal Queue.put call (https://github.com/python/cpython/blob/242c26f53edb965e9808dd918089e664c0223...).
No, that won’t work. Queue.put _already_ defaults to blocking. The reason it doesn’t block here is that Executor creates an unbounded Queue (or, actually, a SimpleQueue), so it’s never full, so it can never block. More generally, if you want blocking queue behavior, you inherently need to to specify a maximum length somewhere. The Executor can’t guess what maximum length you might want (since usually you don’t want _any_ max), so you’d need to add that to the constructor, say an optional max_queued_jobs param. If passed, it creates a Queue(max_queued_jobs) instead of a SimpleQueue(). And once you do that, submit automatically blocks when the Queue is full, so you’re done. Also, the block option is not a choice between blocking vs. ignoring the bounds and succeeding immediately, it’s a choice between blocking and _failing_ immediately. I don’t think that choice is likely to be as useful for executors as for queues, so I don’t think you need (or want) to add anything to the submit API. If you _did_ want to add something anyway, that would be a problem. The submit method takes *args, **kw and forwards them to fn. If you add any additional param, you lose the ability to submit functions that have a param with the same name. Worse, there are common cases where you build further forwarding functions around the submit method. So you might accidentally create situations where code mysteriously starts blocking in 3.9 when it worked as expected in 3.8, and it’s not even clear why. But you could solve all of these problems by just adding a second submit_nowait method instead of adding a block=True parameter to submit. A timeout might be more useful than plain nowait. But I suspect you’d want to always use the same timeout for a single executor, so if that is worth adding, maybe that should be another constructor param, not another submit variant. But anyway, without a compelling use case, or anyone asking for it, I think YAGNI wins here. We don’t have to add timeouts just because we’re adding blocking.
On 4 Sep 2019, at 18:31, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Sep 4, 2019, at 04:21, Chris Simmons <chris.simmons.0@hotmail.com> wrote:
I have seen deployed servers that wrap an Executor with a Semaphore to add this functionality (which is mildly silly, but not when the “better” alternative is to subclass the Executor and use knowledge of its implementation intervals…). Which implies that this feature would be helpful in real life code.
But not quite as described:
It might be a good idea to add a "block" option to Executor.submit (https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures...) that allows the caller to block until a worker is free to handle the request. I believe that this would be trivial to implement by passing the "block" option through to the Executor's internal Queue.put call (https://github.com/python/cpython/blob/242c26f53edb965e9808dd918089e664c0223...).
No, that won’t work. Queue.put _already_ defaults to blocking. The reason it doesn’t block here is that Executor creates an unbounded Queue (or, actually, a SimpleQueue), so it’s never full, so it can never block.
More generally, if you want blocking queue behavior, you inherently need to to specify a maximum length somewhere. The Executor can’t guess what maximum length you might want (since usually you don’t want _any_ max), so you’d need to add that to the constructor, say an optional max_queued_jobs param. If passed, it creates a Queue(max_queued_jobs) instead of a SimpleQueue().
And once you do that, submit automatically blocks when the Queue is full, so you’re done.
Also, the block option is not a choice between blocking vs. ignoring the bounds and succeeding immediately, it’s a choice between blocking and _failing_ immediately. I don’t think that choice is likely to be as useful for executors as for queues, so I don’t think you need (or want) to add anything to the submit API.
If you _did_ want to add something anyway, that would be a problem. The submit method takes *args, **kw and forwards them to fn. If you add any additional param, you lose the ability to submit functions that have a param with the same name. Worse, there are common cases where you build further forwarding functions around the submit method. So you might accidentally create situations where code mysteriously starts blocking in 3.9 when it worked as expected in 3.8, and it’s not even clear why. But you could solve all of these problems by just adding a second submit_nowait method instead of adding a block=True parameter to submit.
A timeout might be more useful than plain nowait. But I suspect you’d want to always use the same timeout for a single executor, so if that is worth adding, maybe that should be another constructor param, not another submit variant. But anyway, without a compelling use case, or anyone asking for it, I think YAGNI wins here. We don’t have to add timeouts just because we’re adding blocking.
Doesn't all that imply that it'd be good if you could just pass it the queue object you want? /Anders
On Sep 4, 2019, at 10:17, Anders Hovmöller <boxed@killingar.net> wrote:
On 4 Sep 2019, at 18:31, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Sep 4, 2019, at 04:21, Chris Simmons <chris.simmons.0@hotmail.com> wrote:
I have seen deployed servers that wrap an Executor with a Semaphore to add this functionality (which is mildly silly, but not when the “better” alternative is to subclass the Executor and use knowledge of its implementation intervals…). Which implies that this feature would be helpful in real life code.
But not quite as described:
It might be a good idea to add a "block" option to Executor.submit (https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures...) that allows the caller to block until a worker is free to handle the request. I believe that this would be trivial to implement by passing the "block" option through to the Executor's internal Queue.put call (https://github.com/python/cpython/blob/242c26f53edb965e9808dd918089e664c0223...).
No, that won’t work. Queue.put _already_ defaults to blocking. The reason it doesn’t block here is that Executor creates an unbounded Queue (or, actually, a SimpleQueue), so it’s never full, so it can never block.
More generally, if you want blocking queue behavior, you inherently need to to specify a maximum length somewhere. The Executor can’t guess what maximum length you might want (since usually you don’t want _any_ max), so you’d need to add that to the constructor, say an optional max_queued_jobs param. If passed, it creates a Queue(max_queued_jobs) instead of a SimpleQueue().
And once you do that, submit automatically blocks when the Queue is full, so you’re done.
Also, the block option is not a choice between blocking vs. ignoring the bounds and succeeding immediately, it’s a choice between blocking and _failing_ immediately. I don’t think that choice is likely to be as useful for executors as for queues, so I don’t think you need (or want) to add anything to the submit API.
If you _did_ want to add something anyway, that would be a problem. The submit method takes *args, **kw and forwards them to fn. If you add any additional param, you lose the ability to submit functions that have a param with the same name. Worse, there are common cases where you build further forwarding functions around the submit method. So you might accidentally create situations where code mysteriously starts blocking in 3.9 when it worked as expected in 3.8, and it’s not even clear why. But you could solve all of these problems by just adding a second submit_nowait method instead of adding a block=True parameter to submit.
A timeout might be more useful than plain nowait. But I suspect you’d want to always use the same timeout for a single executor, so if that is worth adding, maybe that should be another constructor param, not another submit variant. But anyway, without a compelling use case, or anyone asking for it, I think YAGNI wins here. We don’t have to add timeouts just because we’re adding blocking.
Doesn't all that imply that it'd be good if you could just pass it the queue object you want?
Pass it a queue object that you construct? Or a queue factory (which would default to the SimpleQueue class, but you could pass, e.g., partial(Queue, max_len=10)? While either of those would be more flexible, it also breaks the abstraction, and the simplicity of the API. It might still be worth it if anyone had a use case for anything but a fixed queue length, or a custom type of queue, etc. But I suspect nobody does. In more detail (if you want more): Normally, you don’t know, or care, what kind of queue the executor is using. Many people do know (or at least strongly suspect) that there’s a queue under the covers, but most don’t realize that they’re getting a SimpleQueue, or even know what one is; that’s an implementation detail. And, even more importantly., people would then have to be careful to pass a multiprocessing.*Queue for ProcessPoolExecutor but a queue.*Queue for ThreadPoolExecutor. Today, when you discover your tasks are CPU-bound, you can switch to processes by just changing which executor you use, because the queue type is an implementation detail. Adding a max_queue_len param would preserve that. If the user passes a queue object or factory, that’s no longer true. And it would open the door to the common, annoying-to-debug problem of mixing up queue.Queue with multiprocessing or multiprocessing.Queue with threading. (In fact, there are dozens of dups on StackOverflow for this problem, and many of the askers were excited to hear about concurrent.future in part because it doesn’t have this problem…)
On 4 Sep 2019, at 22:58, Andrew Barnert <abarnert@yahoo.com> wrote:
On Sep 4, 2019, at 10:17, Anders Hovmöller <boxed@killingar.net> wrote:
.
Doesn't all that imply that it'd be good if you could just pass it the queue object you want?
Pass it a queue object that you construct? Or a queue factory (which would default to the SimpleQueue class, but you could pass, e.g., partial(Queue, max_len=10)? While either of those would be more flexible, it also breaks the abstraction, and the simplicity of the API.
Well we are talking about the case where the abstraction is broken already so that seems like it's reasonable.
It might still be worth it if anyone had a use case for anything but a fixed queue length, or a custom type of queue, etc. But I suspect nobody does.
Well if the API is changed and we just add the fixed length parameter we'd feel right stupid when the other valid use cases did show up :) /Anders
On Sep 4, 2019, at 21:53, Anders Hovmöller <boxed@killingar.net> wrote:
On 4 Sep 2019, at 22:58, Andrew Barnert <abarnert@yahoo.com> wrote:
On Sep 4, 2019, at 10:17, Anders Hovmöller <boxed@killingar.net> wrote:
.
Doesn't all that imply that it'd be good if you could just pass it the queue object you want?
Pass it a queue object that you construct? Or a queue factory (which would default to the SimpleQueue class, but you could pass, e.g., partial(Queue, max_len=10)? While either of those would be more flexible, it also breaks the abstraction, and the simplicity of the API.
Well we are talking about the case where the abstraction is broken already so that seems like it's reasonable.
No it’s not. Thread pools are a perfectly good abstraction, executors are a perfectly good further abstraction on that idea, with or without blocking (as evidenced by the many APIs that do have that feature, like Ruby’s, and the many APIs that don’t, like Win32’s, all of which are usable without having to know what’s inside them, and without confusion, or even disconcerting “wtf? Oh, I get it… I guess…” moments). Being able to block on a fixed number of queued-up tasks is a useful feature. And if the only way to get it were to crack a seam in a solid abstraction, that would be a tradeoff worth discussing. But in this case, there’s a perfectly good, and obvious, API that fits the abstraction seamlessly (and that already exists in a number of APIs with the same lineage of Python’s) that gives us that feature. So why not do it that way?
It might still be worth it if anyone had a use case for anything but a fixed queue length, or a custom type of queue, etc. But I suspect nobody does.
Well if the API is changed and we just add the fixed length parameter we'd feel right stupid when the other valid use cases did show up :)
No, we’d feel appropriately conservative. :) Seriously, this is exactly why we have the YAGNI principle. We naturally always want to design the most flexible and powerful and abstract thing possible even though there’s only one simple and concrete use for it. But it’s always easy to add more flexibility later if it turns out to be needed; taking flexibility away if it turns out to be confusing or buggy or whatever usually means breaking existing code. (Don’t take that past “take that into account and think twice” into “reject all flexibility without a second thought”, or you’ll end up in extreme programming dogma land, and then you’ll be forced to write Ruby instead of Python.) This thread was started by someone who wanted blocking. I’ve had coworkers ask me how to do that, and seen dozens of questions on Stack Overflow, and implemented it myself two different ways, and I’m pretty sure lots of other real-world programs have done the same. So that’s a real need. Conversely, I don’t think I’ve ever seen anyone who wanted to use a different kind of queue for anything else, or anyone who used the more flexible in other libraries like Ruby’s to add anything but a queue bound and maybe a fail handler flag. So I’m not sure the flexibility buys us anything here.
participants (3)
-
Anders Hovmöller
-
Andrew Barnert
-
Chris Simmons