On Sep 4, 2019, at 08:54, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
How does blocking the submit call differ from setting max_workers in the call to ThreadPoolExecutor?
Here’s a concrete example from my own code: I need to create thousands of images, each of which is about 1MB uncompressed, but compressed down to a 40KB PNG that I save to disk. Compressing and saving takes 20-80x as long as creating, so I want to do that in parallel, so my program runs 16x as fast. But since 16 < 20, the main thread will still get ahead of the workers, and eventually I’ll have a queue with thousands of 1MB pixmaps in it, at which point my system goes into swap hell and slows to a crawl. If I bound the queue at length 16, the main thread automatically blocks whenever it gets too far ahead, and now I have a fixed memory use of about 33MB instead of unbounded GB, and my program really does run almost 16x as fast as the original serial version. And the proposal in this thread would allow me to do that with just a couple lines of code: construct an executor with a max queue length at the top, and replace the call to the compress-and-write function with a submit of that call, and I’m done. Could I instead move the pixmap creation into the worker tasks and rearrange the calculations and add locking so they could all share the accumulator state correctly? Sure, but it would be a lot more complicated, and probably a bit slower (since parallelizing code that isn’t in a bottleneck, and then adding locks to it, is a pessimization).