Add a "block" option to Executor.submit

I have a script that uploads files to Google Drive. It presently performs the uploads serially, but I want to do the uploads in parallel--with a reasonable number of simultaneous uploads--and see if that improves performance. I think that an Executor is the best way to accomplish this task. The trouble is that there's no reason for my script to continue queuing uploads while all of the Executor's workers are busy. In theory, if the number of files to upload is large enough, trying to queue all of them could cause the process to run out of memory. Even if it didn't run out of memory, it could consume an excessive amount of memory. It might be a good idea to add a "block" option to Executor.submit (https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures...) that allows the caller to block until a worker is free to handle the request. I believe that this would be trivial to implement by passing the "block" option through to the Executor's internal Queue.put call (https://github.com/python/cpython/blob/242c26f53edb965e9808dd918089e664c0223...).

(Somehow your post came in twice. I'm replying to the second one.) This seems a reasonable idea. A problem may be how to specify this, since all positional and keyword arguments to `submit()` after the function object are passed to the call. A possible solution would be to add a second call, `submit_blocking()`, that has the semantics you desire. I recommend that you open an issue on bugs.python.org to discuss the form this API should take, explaining your use case, and in the meantime prepare a PR for the GitHub cpython project to be linked to the issue. Good luck improving Python! --Guido On Wed, Sep 4, 2019 at 4:24 AM Chris Simmons <chris.simmons.0@hotmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

I second that such a feature would be useful, as I am on the verge of implementing a work-around for that in a project right now. And maybe, instead of a "submit_blocking" create the new method so that it takes the arguments to the future as a explict sequence and mapping in named parameters? so - executor.submit_args(func, args=(), kwargs={'file': ...}, blocking=True) ? On Wed, 4 Sep 2019 at 11:39, Guido van Rossum <guido@python.org> wrote:

On 9/4/19 11:08 AM, Joao S. O. Bueno wrote:
I'm sure I'm missing something, but isn't that the point of a ThreadPoolExecutor? Yes, you can submit more requests than you have resources to execute concurrently, but the executor itself limits the number of requests it executes at once to a given (to the executor's initializer) number. The "blocked" requests are simply entries in a queue, and shouldn't consume lots of memory. How does blocking the submit call differ from setting max_workers in the call to ThreadPoolExecutor? If the use case is uploading files, and you're reading the entire file into memory before submitting a request to upload it, then change that design to a ThreadPoolExecutor whose tasks read the file into memory (preferebly in chunks, at that) and upload it.

On Wed, Sep 4, 2019, 10:40 PM Dan Sommers < 2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
The enteries in the queue might take more memory than you realize. For example, if you have 100,000 files, and you process only 8 at a time, why would you have more than 8 items in the queue? 100k queue enteries will be a memory hog, not talking about the memory fragmentation it might cause. The queue is generated dynamically, you can stop generating it once the limit is reached.

On Sep 4, 2019, at 08:54, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
How does blocking the submit call differ from setting max_workers in the call to ThreadPoolExecutor?
Here’s a concrete example from my own code: I need to create thousands of images, each of which is about 1MB uncompressed, but compressed down to a 40KB PNG that I save to disk. Compressing and saving takes 20-80x as long as creating, so I want to do that in parallel, so my program runs 16x as fast. But since 16 < 20, the main thread will still get ahead of the workers, and eventually I’ll have a queue with thousands of 1MB pixmaps in it, at which point my system goes into swap hell and slows to a crawl. If I bound the queue at length 16, the main thread automatically blocks whenever it gets too far ahead, and now I have a fixed memory use of about 33MB instead of unbounded GB, and my program really does run almost 16x as fast as the original serial version. And the proposal in this thread would allow me to do that with just a couple lines of code: construct an executor with a max queue length at the top, and replace the call to the compress-and-write function with a submit of that call, and I’m done. Could I instead move the pixmap creation into the worker tasks and rearrange the calculations and add locking so they could all share the accumulator state correctly? Sure, but it would be a lot more complicated, and probably a bit slower (since parallelizing code that isn’t in a bottleneck, and then adding locks to it, is a pessimization).

I'm sorry but I truly fail to see the complication: sem = Semaphore(10) # line num 1 somewhere near executor creation sem.acquire() # line number 2, right before submit future = executor.sumbit(...) future.add_done_callback(lambda x: sem.release()) # line number 3, right after submit. It's only 3 lines of code, barely noticeable, quite clear, with minimal overhead. You can start turning this into a decorator or a general wrapper but you'll only achieve a higher degree of complication. I've programmed with asyncio in that way for years, and these are almost the exact same lines I use, as asyncio futures are the same, supporting add_done_callback and so on. You may even inject the Semaphore into the executor if you wish to save yourself a local variable, but I doubt it's needed. Just think the general idea might be a premature optimisation. If already, I would have either changed the signature of the executor creation to have a new 'max_queue' keyword-only argument, or allow you to enter the queue as an input, but I still believe that any attempt to create such an interface will just cause more complications than the simple 3 line solution. -- Bar Harel On Thu, Sep 5, 2019, 12:42 AM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:

On Sep 4, 2019, at 19:52, Bar Harel <bzvi7919@gmail.com> wrote:
Can you really barely notice the difference between this: future = x.submit(func, arg) … and this: sem.acquire() future = x.submit(func, arg) future.add_done_callback(lambda x: sem.release()) You’ve turned 1 line of code into 3, and the number of characters or tokens or concepts to think about have all increased even more; the actual function is now buried among the boilerplate. I can’t see how you can call that “barely noticeable”. There’s also the fact that you have to learn about semaphores to do it this way. To you or me that may seem trivial, but one of the appeals of concurrent.futures is that you don’t have to think about synchronization, because it’s all tied up in the queue, which means it can even be used by rank novices who don’t even know the difference between a condition variable and a barrier. (I’m not saying people shouldn’t learn how to use semaphores. They should also learn how to build their own thread pools, and understand how futures work under the covers, and so on, but the fact that they can make their code concurrent, and do it correctly, even before they’ve learned all that is pretty nice, and I think this proposal is a simple extension that extends that niceness.) I have taught novices how to use executors in a few minutes. Often they just need to look at the parallel-downloader example and they get it. And I’m pretty sure I could also explain to them how and why to limit concurrency by passing a max_queue_len parameter in a few minutes. In fact, I’ve seen a novice find the equivalent feature in Ruby in a couple minutes of web searching and add it to his program, only stumbling over trying to figure out the deal with all the different fail policies (IIRC, Ruby has all the options from Java and more—block, raise, discard and return an empty future, run synchronously in your thread, pass the method to an arbitrary fail function… I think either of the first two would satisfy 90% of the uses, so that complexity isn’t needed. At least if nobody’s asked for it.)
quite clear, with minimal overhead.
But the max_queue_len is even clearer, and has even less overhead (on top of a whole lot less boilerplate).
Just think the general idea might be a premature optimisation.
It’s not about optimization—as I already said in my previous email (the one you seem to be replying to, although it’s not the one you quoted); the performance difference is unlikely to matter in most code. It’s about readability, boilerplate, novice-friendliness, etc. These are much bigger wins than saving a microsecond in a process that’s slow enough to execute on another thread. All that being said, despite it making no difference in the vast majority of real-world uses, it still seems a little perverse to insist on using the slower code here. I will gladly write slower code where it doesn’t matter if it makes it easier for more people to understand my code, but I rarely write slower code because of some matter of abstract principle, and especially not if it makes it harder for more people to understand my code.
If already, I would have either changed the signature of the executor creation to have a new 'max_queue' keyword-only argument, or allow you to enter the queue as an input,
If you would have added it in the original version, how is adding it now any different? I assume the only reason you would have added it is that it’s a nicer API. So what counters that in the opposite direction? It’s not like it’s going to cause any backward compatibility issues or anything, is it?
but I still believe that any attempt to create such an interface will just cause more complications than the simple 3 line solution.
Why would it cause complications? It’s dead simple to design and implement, dead simple to understand, and relies on well-understood and well-rested behavior that’s been part of the queue and mp modules since long before concurrent even existed.

On 9/4/19 5:38 PM, Andrew Barnert via Python-ideas wrote:
On Sep 4, 2019, at 08:54, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
How does blocking the submit call differ from setting max_workers in the call to ThreadPoolExecutor?
Here’s a concrete example from my own code:
Aha. Thanks.
I need to create thousands of images, each of which is about 1MB uncompressed, but compressed down to a 40KB PNG that I save to disk.
Compressing and saving takes 20-80x as long as creating, so I want to do that in parallel, so my program runs 16x as fast.
Without knowing anything else, I would wonder why you've combined compressing (an apparently CPU bound operation) and saviing (an apparently I/O bound operation) togther, but left creating separate. Please don't answer that; it's not related to Python.
Yes, you need some way to produce "back pressure" from downstream to upstream, and to stop making new work (with new memory consumption) until there's a place to put it.
"[T]he accumulator?" Is there only one data store to manage multiple instances of three different/separate operations? Ouch. At least that's my reaction to the limited amount of insight I have right now. Now we're discussing how to desgign a concurrent application to maximize resources and minimize complexity and overall run time. Yes, if you build your application one way, and run into issues, then some ways of addressing the issues (e.g., going back to the design phase) will cost more to implement than others (tweaking a supporting libraries). It's happened to all of us. :-) I'm not against tweaking the standard library (and even if I were, my vote shouldn't count for much). For *this case*, it seemed to me that changing the standard library was Less Better™ than considering the concurrency issues earlier on in the development process. There's also an old software engineer inside me that wants most of this control up near the top (where I can see it) as opposed to way down inside the supporting libraries (where it becomes magic that has to be debugged when it's not doing what I thought it would). That's great for new applications, but it doesn't always stay that way over time.

It seems that this is the important idea. How does the back pressure work in different use cases. 1) I don't care. Just add items to the queue. (The current API?) 2) I can be blocked when the queue is at a limit. Need an API to allows the block to happen and resumes the code when there is space in the queue. 3) I cannot be blocked as this is async code, but I need a bound on the queue. Maybe call "queue has space" callback when the queue has space? Maybe submit returns an "queue full" status? I have needed all 3 uses cases. Barry

On Sep 5, 2019, at 11:13, Barry Scott <barry@barrys-emacs.org> wrote:
Have you actually needed case 3 with Executor, or only with other kinds of async models? Anyway, failing makes sense, is usable for backpressure, and is trivial to implement. Although I think it should raise an exception rather than return a special value in place of a tuple. Here’s a design (borrowed from Queue of course): * submit always succeeds, blocking if necessary * submit_nowait always completes immediately, raising executor.Full if necessary And of course if you don’t pass a max_queue_len, they both always succeed immediately. I’m not sure submit_nowait is needed. And if we add max_queue_len in 3.x and people start asking for submit_nowait, it would be trivial to add it in 3.x+1. But if someone has an actual use case for it now, I don’t see any problem at all adding it now. Do you have real code or realistic sample code that needs it? I don’t think the queue-has-room callback, on the other hand, is very useful. That’s just clunkily simulating an unbounded queue on top of a bounded queue plus an implicit unbounded queue of callbacks. Anyway, in theory, there are a zillion different things you could conceivably want to happen on failure, which is why Java and Ruby have a zillion different fail protocols, and Ruby lets you set a fail method if even that isn’t good enough. But in practice, I’m pretty sure either that people only need “block”, or that they only need “block” and “raise”. I’m just not sure which of those two.

One thing I think is worth raising here: If my experience with concurrent.futures is far from universal, my expectations for this change may not be helpful, so let me lay that all out explicitly. In my experience, concurrent.futures has two very good, but very different, uses. First, there’s trivial concurrency. The last stage of my processing is an obviously parallelizable pure function? Just toss it in an executor and I’m done. My asyncio code needs to call something blocking that doesn’t have an async equivalent? Toss it in an executor and I’m done. On the other extreme, there are cases where your data flow is complicated, and organizing it all around the composability of futures helps rein in that complexity. In between those two extremes, it’s often easier to use something different—e.g., a multiprocessing.Pool has a lot more options, and a lot more convenience functions for different ways of mapping; I’ve never missed Java’s thread-per-task executor because it’s easier to just use a Thread directly; you wouldn’t want a timer scheduling queue that hid all the time information under the covers; etc. I’m expect this change will mostly be helpful for the trivial case. The last stage of my processing is obviously parallelizable? Just stick it in an executor and … wait, it’s still too slow, so the caller is wasting too many resources in the queue? Just stick it in a _bounded_ executor and I’m done. In the really complicated cases, the backpressure is usually going to be in the future waiting, not in the task queuing, even if that’s more complicated to set up. Because otherwise the flow isn’t 100% composable anymore, which is the whole reason I was using concurrent.futures. Anything in the moderately complicated range, and I‘m probably going to want an explicitly-managed queue rather than one hidden behind an abstraction, or a separate semaphore, or whatever. But then in those cases, I’m probably not even using concurrent.futures in the first place. So, max_queue_len with blocking submit (and maybe also with raising submit_nowait) handles virtually all of the simple cases that I can think of or that anyone else has suggested, and I don’t think it matters much how many of the moderate or extremely complicated cases it handles.

On 5 Sep 2019, at 20:43, Andrew Barnert <abarnert@yahoo.com> wrote:
Have you actually needed case 3 with Executor, or only with other kinds of async models?
With other kinds of async. I mention it as this looks like a design pattern for this problem space. Barry

On Sep 4, 2019, at 08:08, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
I second that such a feature would be useful, as I am on the verge of implementing a work-around for that in a project right now.
This seems common enough that, whatever the final design is, someone should put a concurrent39 or whatever backport package on PyPI. (I assume both you and Chris need this feature now (and can’t afford to wait 1.5 years or longer until you can get away with requiring Python 3.9+), and I suspect there are multiple others in the same boat, so it seems like there’s a good chance someone will do it.)

(Somehow your post came in twice. I'm replying to the second one.) This seems a reasonable idea. A problem may be how to specify this, since all positional and keyword arguments to `submit()` after the function object are passed to the call. A possible solution would be to add a second call, `submit_blocking()`, that has the semantics you desire. I recommend that you open an issue on bugs.python.org to discuss the form this API should take, explaining your use case, and in the meantime prepare a PR for the GitHub cpython project to be linked to the issue. Good luck improving Python! --Guido On Wed, Sep 4, 2019 at 4:24 AM Chris Simmons <chris.simmons.0@hotmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

I second that such a feature would be useful, as I am on the verge of implementing a work-around for that in a project right now. And maybe, instead of a "submit_blocking" create the new method so that it takes the arguments to the future as a explict sequence and mapping in named parameters? so - executor.submit_args(func, args=(), kwargs={'file': ...}, blocking=True) ? On Wed, 4 Sep 2019 at 11:39, Guido van Rossum <guido@python.org> wrote:

On 9/4/19 11:08 AM, Joao S. O. Bueno wrote:
I'm sure I'm missing something, but isn't that the point of a ThreadPoolExecutor? Yes, you can submit more requests than you have resources to execute concurrently, but the executor itself limits the number of requests it executes at once to a given (to the executor's initializer) number. The "blocked" requests are simply entries in a queue, and shouldn't consume lots of memory. How does blocking the submit call differ from setting max_workers in the call to ThreadPoolExecutor? If the use case is uploading files, and you're reading the entire file into memory before submitting a request to upload it, then change that design to a ThreadPoolExecutor whose tasks read the file into memory (preferebly in chunks, at that) and upload it.

On Wed, Sep 4, 2019, 10:40 PM Dan Sommers < 2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
The enteries in the queue might take more memory than you realize. For example, if you have 100,000 files, and you process only 8 at a time, why would you have more than 8 items in the queue? 100k queue enteries will be a memory hog, not talking about the memory fragmentation it might cause. The queue is generated dynamically, you can stop generating it once the limit is reached.

On Sep 4, 2019, at 08:54, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
How does blocking the submit call differ from setting max_workers in the call to ThreadPoolExecutor?
Here’s a concrete example from my own code: I need to create thousands of images, each of which is about 1MB uncompressed, but compressed down to a 40KB PNG that I save to disk. Compressing and saving takes 20-80x as long as creating, so I want to do that in parallel, so my program runs 16x as fast. But since 16 < 20, the main thread will still get ahead of the workers, and eventually I’ll have a queue with thousands of 1MB pixmaps in it, at which point my system goes into swap hell and slows to a crawl. If I bound the queue at length 16, the main thread automatically blocks whenever it gets too far ahead, and now I have a fixed memory use of about 33MB instead of unbounded GB, and my program really does run almost 16x as fast as the original serial version. And the proposal in this thread would allow me to do that with just a couple lines of code: construct an executor with a max queue length at the top, and replace the call to the compress-and-write function with a submit of that call, and I’m done. Could I instead move the pixmap creation into the worker tasks and rearrange the calculations and add locking so they could all share the accumulator state correctly? Sure, but it would be a lot more complicated, and probably a bit slower (since parallelizing code that isn’t in a bottleneck, and then adding locks to it, is a pessimization).

I'm sorry but I truly fail to see the complication: sem = Semaphore(10) # line num 1 somewhere near executor creation sem.acquire() # line number 2, right before submit future = executor.sumbit(...) future.add_done_callback(lambda x: sem.release()) # line number 3, right after submit. It's only 3 lines of code, barely noticeable, quite clear, with minimal overhead. You can start turning this into a decorator or a general wrapper but you'll only achieve a higher degree of complication. I've programmed with asyncio in that way for years, and these are almost the exact same lines I use, as asyncio futures are the same, supporting add_done_callback and so on. You may even inject the Semaphore into the executor if you wish to save yourself a local variable, but I doubt it's needed. Just think the general idea might be a premature optimisation. If already, I would have either changed the signature of the executor creation to have a new 'max_queue' keyword-only argument, or allow you to enter the queue as an input, but I still believe that any attempt to create such an interface will just cause more complications than the simple 3 line solution. -- Bar Harel On Thu, Sep 5, 2019, 12:42 AM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:

On Sep 4, 2019, at 19:52, Bar Harel <bzvi7919@gmail.com> wrote:
Can you really barely notice the difference between this: future = x.submit(func, arg) … and this: sem.acquire() future = x.submit(func, arg) future.add_done_callback(lambda x: sem.release()) You’ve turned 1 line of code into 3, and the number of characters or tokens or concepts to think about have all increased even more; the actual function is now buried among the boilerplate. I can’t see how you can call that “barely noticeable”. There’s also the fact that you have to learn about semaphores to do it this way. To you or me that may seem trivial, but one of the appeals of concurrent.futures is that you don’t have to think about synchronization, because it’s all tied up in the queue, which means it can even be used by rank novices who don’t even know the difference between a condition variable and a barrier. (I’m not saying people shouldn’t learn how to use semaphores. They should also learn how to build their own thread pools, and understand how futures work under the covers, and so on, but the fact that they can make their code concurrent, and do it correctly, even before they’ve learned all that is pretty nice, and I think this proposal is a simple extension that extends that niceness.) I have taught novices how to use executors in a few minutes. Often they just need to look at the parallel-downloader example and they get it. And I’m pretty sure I could also explain to them how and why to limit concurrency by passing a max_queue_len parameter in a few minutes. In fact, I’ve seen a novice find the equivalent feature in Ruby in a couple minutes of web searching and add it to his program, only stumbling over trying to figure out the deal with all the different fail policies (IIRC, Ruby has all the options from Java and more—block, raise, discard and return an empty future, run synchronously in your thread, pass the method to an arbitrary fail function… I think either of the first two would satisfy 90% of the uses, so that complexity isn’t needed. At least if nobody’s asked for it.)
quite clear, with minimal overhead.
But the max_queue_len is even clearer, and has even less overhead (on top of a whole lot less boilerplate).
Just think the general idea might be a premature optimisation.
It’s not about optimization—as I already said in my previous email (the one you seem to be replying to, although it’s not the one you quoted); the performance difference is unlikely to matter in most code. It’s about readability, boilerplate, novice-friendliness, etc. These are much bigger wins than saving a microsecond in a process that’s slow enough to execute on another thread. All that being said, despite it making no difference in the vast majority of real-world uses, it still seems a little perverse to insist on using the slower code here. I will gladly write slower code where it doesn’t matter if it makes it easier for more people to understand my code, but I rarely write slower code because of some matter of abstract principle, and especially not if it makes it harder for more people to understand my code.
If already, I would have either changed the signature of the executor creation to have a new 'max_queue' keyword-only argument, or allow you to enter the queue as an input,
If you would have added it in the original version, how is adding it now any different? I assume the only reason you would have added it is that it’s a nicer API. So what counters that in the opposite direction? It’s not like it’s going to cause any backward compatibility issues or anything, is it?
but I still believe that any attempt to create such an interface will just cause more complications than the simple 3 line solution.
Why would it cause complications? It’s dead simple to design and implement, dead simple to understand, and relies on well-understood and well-rested behavior that’s been part of the queue and mp modules since long before concurrent even existed.

On 9/4/19 5:38 PM, Andrew Barnert via Python-ideas wrote:
On Sep 4, 2019, at 08:54, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
How does blocking the submit call differ from setting max_workers in the call to ThreadPoolExecutor?
Here’s a concrete example from my own code:
Aha. Thanks.
I need to create thousands of images, each of which is about 1MB uncompressed, but compressed down to a 40KB PNG that I save to disk.
Compressing and saving takes 20-80x as long as creating, so I want to do that in parallel, so my program runs 16x as fast.
Without knowing anything else, I would wonder why you've combined compressing (an apparently CPU bound operation) and saviing (an apparently I/O bound operation) togther, but left creating separate. Please don't answer that; it's not related to Python.
Yes, you need some way to produce "back pressure" from downstream to upstream, and to stop making new work (with new memory consumption) until there's a place to put it.
"[T]he accumulator?" Is there only one data store to manage multiple instances of three different/separate operations? Ouch. At least that's my reaction to the limited amount of insight I have right now. Now we're discussing how to desgign a concurrent application to maximize resources and minimize complexity and overall run time. Yes, if you build your application one way, and run into issues, then some ways of addressing the issues (e.g., going back to the design phase) will cost more to implement than others (tweaking a supporting libraries). It's happened to all of us. :-) I'm not against tweaking the standard library (and even if I were, my vote shouldn't count for much). For *this case*, it seemed to me that changing the standard library was Less Better™ than considering the concurrency issues earlier on in the development process. There's also an old software engineer inside me that wants most of this control up near the top (where I can see it) as opposed to way down inside the supporting libraries (where it becomes magic that has to be debugged when it's not doing what I thought it would). That's great for new applications, but it doesn't always stay that way over time.

It seems that this is the important idea. How does the back pressure work in different use cases. 1) I don't care. Just add items to the queue. (The current API?) 2) I can be blocked when the queue is at a limit. Need an API to allows the block to happen and resumes the code when there is space in the queue. 3) I cannot be blocked as this is async code, but I need a bound on the queue. Maybe call "queue has space" callback when the queue has space? Maybe submit returns an "queue full" status? I have needed all 3 uses cases. Barry

On Sep 5, 2019, at 11:13, Barry Scott <barry@barrys-emacs.org> wrote:
Have you actually needed case 3 with Executor, or only with other kinds of async models? Anyway, failing makes sense, is usable for backpressure, and is trivial to implement. Although I think it should raise an exception rather than return a special value in place of a tuple. Here’s a design (borrowed from Queue of course): * submit always succeeds, blocking if necessary * submit_nowait always completes immediately, raising executor.Full if necessary And of course if you don’t pass a max_queue_len, they both always succeed immediately. I’m not sure submit_nowait is needed. And if we add max_queue_len in 3.x and people start asking for submit_nowait, it would be trivial to add it in 3.x+1. But if someone has an actual use case for it now, I don’t see any problem at all adding it now. Do you have real code or realistic sample code that needs it? I don’t think the queue-has-room callback, on the other hand, is very useful. That’s just clunkily simulating an unbounded queue on top of a bounded queue plus an implicit unbounded queue of callbacks. Anyway, in theory, there are a zillion different things you could conceivably want to happen on failure, which is why Java and Ruby have a zillion different fail protocols, and Ruby lets you set a fail method if even that isn’t good enough. But in practice, I’m pretty sure either that people only need “block”, or that they only need “block” and “raise”. I’m just not sure which of those two.

One thing I think is worth raising here: If my experience with concurrent.futures is far from universal, my expectations for this change may not be helpful, so let me lay that all out explicitly. In my experience, concurrent.futures has two very good, but very different, uses. First, there’s trivial concurrency. The last stage of my processing is an obviously parallelizable pure function? Just toss it in an executor and I’m done. My asyncio code needs to call something blocking that doesn’t have an async equivalent? Toss it in an executor and I’m done. On the other extreme, there are cases where your data flow is complicated, and organizing it all around the composability of futures helps rein in that complexity. In between those two extremes, it’s often easier to use something different—e.g., a multiprocessing.Pool has a lot more options, and a lot more convenience functions for different ways of mapping; I’ve never missed Java’s thread-per-task executor because it’s easier to just use a Thread directly; you wouldn’t want a timer scheduling queue that hid all the time information under the covers; etc. I’m expect this change will mostly be helpful for the trivial case. The last stage of my processing is obviously parallelizable? Just stick it in an executor and … wait, it’s still too slow, so the caller is wasting too many resources in the queue? Just stick it in a _bounded_ executor and I’m done. In the really complicated cases, the backpressure is usually going to be in the future waiting, not in the task queuing, even if that’s more complicated to set up. Because otherwise the flow isn’t 100% composable anymore, which is the whole reason I was using concurrent.futures. Anything in the moderately complicated range, and I‘m probably going to want an explicitly-managed queue rather than one hidden behind an abstraction, or a separate semaphore, or whatever. But then in those cases, I’m probably not even using concurrent.futures in the first place. So, max_queue_len with blocking submit (and maybe also with raising submit_nowait) handles virtually all of the simple cases that I can think of or that anyone else has suggested, and I don’t think it matters much how many of the moderate or extremely complicated cases it handles.

On 5 Sep 2019, at 20:43, Andrew Barnert <abarnert@yahoo.com> wrote:
Have you actually needed case 3 with Executor, or only with other kinds of async models?
With other kinds of async. I mention it as this looks like a design pattern for this problem space. Barry

On Sep 4, 2019, at 08:08, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
I second that such a feature would be useful, as I am on the verge of implementing a work-around for that in a project right now.
This seems common enough that, whatever the final design is, someone should put a concurrent39 or whatever backport package on PyPI. (I assume both you and Chris need this feature now (and can’t afford to wait 1.5 years or longer until you can get away with requiring Python 3.9+), and I suspect there are multiple others in the same boat, so it seems like there’s a good chance someone will do it.)
participants (7)
-
Andrew Barnert
-
Bar Harel
-
Barry Scott
-
Chris Simmons
-
Dan Sommers
-
Guido van Rossum
-
Joao S. O. Bueno