On Thu, May 17, 2012 at 4:43 AM, Adrian Sampson <asampson@cs.washington.edu> wrote:
The concurrent.futures module in the Python standard library has problems with composability. If I start a ThreadPoolExecutor to run some library functions that internally use ThreadPoolExecutor, I will end up with many more worker threads on my system than I expect. For example, each parallel execution wants to take full advantage of an 8-core machine, I could end up with as many as 8*8=64 competing worker threads, which could significantly hurt performance.

This is because each instance of ThreadPoolExecutor (or ProcessPoolExecutor) maintains its own independent worker pool. Especially in situations where the goal is to exploit multiple CPUs, it's essential for any thread pool implementation to globally manage contention between multiple concurrent job schedulers.

I'm not sure about the best way to address this problem, but here's one proposal: Add additional executors to the futures library. ComposableThreadPoolExecutor and ComposableProcessPoolExecutor would each use a *shared* thread-pool model. When created, these composable executors will check to see if they are being created within a future worker thread/process initiated by another composable executor. If so, the "child" executor will forward all submitted jobs to the executor in the parent thread/process. Otherwise, it will behave normally, starting up its own worker pool.

Has anyone else dealt with composition problems in parallel programs? What do you think of this solution -- is there a better way to tackle this deficiency?

It's my understanding this is a known flaw with concurrency *in general*. Currently most multi-{threaded,process} applications assume they're the only ones running on the system. As does the likely implementation of the proposed composable pools problem you've posed. A proper interprocess scheduler is required to handle this ideally. (See GCD, and runtime implementations that provide at least some userspace scheduling such as Go, however poor it may be). 

Secondly, composable pools don't handle recursive relationships well. If a thread in one pool depends on the completion of all the tasks in its own pool to complete before it can itself complete, you'll have deadlock.

Personally if I implemented a composable thread pool I'd have it global, creation and submission of tasks would be proxied to it via some composable executor class.

As it stands, thread pools are best for task-oriented concurrency rather than parallelism anyway, especially in CPython.

In short, I think composable thread pools are a hack at best and won't gain you anything except a slightly reduced threading overhead. If you want optimal utilization, threading isn't the right place to be looking.