[Python-ideas] Re: Add a "block" option to Executor.submit

4 Sep 2019

      On Sep 4, 2019, at 10:17, Anders Hovmöller <boxed@killingar.net> wrote:
...
...
On 4 Sep 2019, at 18:31, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Sep 4, 2019, at 04:21, Chris Simmons <chris.simmons.0@hotmail.com> wrote:
I have seen deployed servers that wrap an Executor with a Semaphore to add this functionality (which is mildly silly, but not when the “better” alternative is to subclass the Executor and use knowledge of its implementation intervals…). Which implies that this feature would be helpful in real life code.
But not quite as described:
...
It might be a good idea to add a "block" option to Executor.submit (https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures...)  that allows the caller to block until a worker is free to handle the request. I believe that this would be trivial to implement by passing the "block" option through to the Executor's internal Queue.put call (https://github.com/python/cpython/blob/242c26f53edb965e9808dd918089e664c0223...).
No, that won’t work. Queue.put _already_ defaults to blocking. The reason it doesn’t block here is that Executor creates an unbounded Queue (or, actually, a SimpleQueue), so it’s never full, so it can never block.
More generally, if you want blocking queue behavior, you inherently need to to specify a maximum length somewhere. The Executor can’t guess what maximum length you might want (since usually you don’t want _any_ max), so you’d need to add that to the constructor, say an optional max_queued_jobs param. If passed, it creates a Queue(max_queued_jobs) instead of a SimpleQueue().
And once you do that, submit automatically blocks when the Queue is full, so you’re done.
Also, the block option is not a choice between blocking vs. ignoring the bounds and succeeding immediately, it’s a choice between blocking and _failing_ immediately. I don’t think that choice is likely to be as useful for executors as for queues, so I don’t think you need (or want) to add anything to the submit API.
If you _did_ want to add something anyway, that would be a problem. The submit method takes *args, **kw and forwards them to fn. If you add any additional param, you lose the ability to submit functions that have a param with the same name. Worse, there are common cases where you build further forwarding functions around the submit method. So you might accidentally create situations where code mysteriously starts blocking in 3.9 when it worked as expected in 3.8, and it’s not even clear why. But you could solve all of these problems by just adding a second submit_nowait method instead of adding a block=True parameter to submit.
A timeout might be more useful than plain nowait. But I suspect you’d want to always use the same timeout for a single executor, so if that is worth adding, maybe that should be another constructor param, not another submit variant. But anyway, without a compelling use case, or anyone asking for it, I think YAGNI wins here. We don’t have to add timeouts just because we’re adding blocking.
Doesn't all that imply that it'd be good if you could just pass it the queue object you want?
Pass it a queue object that you construct? Or a queue factory (which would default to the SimpleQueue class, but you could pass, e.g., partial(Queue, max_len=10)? While either of those would be more flexible, it also breaks the abstraction, and the simplicity of the API.

It might still be worth it if anyone had a use case for anything but a fixed queue length, or a custom type of queue, etc. But I suspect nobody does.

In more detail (if you want more):

Normally, you don’t know, or care, what kind of queue the executor is using. Many people do know (or at least strongly suspect) that there’s a queue under the covers, but most don’t realize that they’re getting a SimpleQueue, or even know what one is; that’s an implementation detail.

And, even more importantly., people would then have to be careful to pass a multiprocessing.*Queue for ProcessPoolExecutor but a queue.*Queue for ThreadPoolExecutor. Today, when you discover your tasks are CPU-bound, you can switch to processes by just changing which executor you use, because the queue type is an implementation detail. Adding a max_queue_len param would preserve that. If the user passes a queue object or factory, that’s no longer true. And it would open the door to the common, annoying-to-debug problem of mixing up queue.Queue with multiprocessing or multiprocessing.Queue with threading. (In fact, there are dozens of dups on StackOverflow for this problem, and many of the askers were excited to hear about concurrent.future in part because it doesn’t have this problem…)