The concurrent.futures module in the Python standard library has problems with composability. If I start a ThreadPoolExecutor to run some library functions that internally use ThreadPoolExecutor, I will end up with many more worker threads on my system than I expect. For example, each parallel execution wants to take full advantage of an 8-core machine, I could end up with as many as 8*8=64 competing worker threads, which could significantly hurt performance.
This is because each instance of ThreadPoolExecutor (or ProcessPoolExecutor) maintains its own independent worker pool. Especially in situations where the goal is to exploit multiple CPUs, it's essential for any thread pool implementation to globally manage contention between multiple concurrent job schedulers.
I'm not sure about the best way to address this problem, but here's one proposal: Add additional executors to the futures library. ComposableThreadPoolExecutor and ComposableProcessPoolExecutor would each use a *shared* thread-pool model. When created, these composable executors will check to see if they are being created within a future worker thread/process initiated by another composable executor. If so, the "child" executor will forward all submitted jobs to the executor in the parent thread/process. Otherwise, it will behave normally, starting up its own worker pool.
Has anyone else dealt with composition problems in parallel programs? What do you think of this solution -- is there a better way to tackle this deficiency?