Composability and concurrent.futures

The concurrent.futures module in the Python standard library has problems with composability. If I start a ThreadPoolExecutor to run some library functions that internally use ThreadPoolExecutor, I will end up with many more worker threads on my system than I expect. For example, each parallel execution wants to take full advantage of an 8-core machine, I could end up with as many as 8*8=64 competing worker threads, which could significantly hurt performance. This is because each instance of ThreadPoolExecutor (or ProcessPoolExecutor) maintains its own independent worker pool. Especially in situations where the goal is to exploit multiple CPUs, it's essential for any thread pool implementation to globally manage contention between multiple concurrent job schedulers. I'm not sure about the best way to address this problem, but here's one proposal: Add additional executors to the futures library. ComposableThreadPoolExecutor and ComposableProcessPoolExecutor would each use a *shared* thread-pool model. When created, these composable executors will check to see if they are being created within a future worker thread/process initiated by another composable executor. If so, the "child" executor will forward all submitted jobs to the executor in the parent thread/process. Otherwise, it will behave normally, starting up its own worker pool. Has anyone else dealt with composition problems in parallel programs? What do you think of this solution -- is there a better way to tackle this deficiency? Adrian

On Thu, May 17, 2012 at 4:43 AM, Adrian Sampson <asampson@cs.washington.edu>wrote:
It's my understanding this is a known flaw with concurrency *in general*. Currently most multi-{threaded,process} applications assume they're the only ones running on the system. As does the likely implementation of the proposed composable pools problem you've posed. A proper interprocess scheduler is required to handle this ideally. (See GCD, and runtime implementations that provide at least some userspace scheduling such as Go, however poor it may be). Secondly, composable pools don't handle recursive relationships well. If a thread in one pool depends on the completion of all the tasks in its own pool to complete before it can itself complete, you'll have deadlock. Personally if I implemented a composable thread pool I'd have it global, creation and submission of tasks would be proxied to it via some composable executor class. As it stands, thread pools are best for task-oriented concurrency rather than parallelism anyway, especially in CPython. In short, I think composable thread pools are a hack at best and won't gain you anything except a slightly reduced threading overhead. If you want optimal utilization, threading isn't the right place to be looking.

On May 21, 2012, at 9:17 AM, Matt Joiner wrote:
I agree completely. Maybe the implementation I described was overly hacky for the sake of transparent compatibility with the existing (non-composable) executors in concurrent.futures. Ideally, the system would have one global pool which many concurrency APIs -- not just concurrent.futures -- could potentially share. (In a *really* ideal world, the OS would provide thread pool management -- like GCD, which you mentioned, or scheduler activations. But a cross-platform library currently requires a less ambitious solution.)
To be clear, I meant to refer to processes *or* threads when discussing the problem originally. The ProcessPoolExecutor is pretty useful (in my experience) for easily getting speedup even on pure-Python CPU-bound workloads. Adrian

FWIW that wasn't the default "use processes" spike. In my experience toying with concurrency in Python, trying to manage the load threads put on the system always ends badly. The 2 best supported concurrency mechanisms, threads and processes are constantly tête-à-tête, neither are adequate when you start to consider extreme concurrency scenarios. I suggest this because if you're considering composing executors, you're already trying to reduce the overhead (wastage) that processes and threads are incurring on your system for these purposes.

It's really up to individual libraries to make it possible for applications to provide the executor explicitly, rather than the library assuming it's OK to just create its own. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, May 17, 2012 at 4:43 AM, Adrian Sampson <asampson@cs.washington.edu>wrote:
It's my understanding this is a known flaw with concurrency *in general*. Currently most multi-{threaded,process} applications assume they're the only ones running on the system. As does the likely implementation of the proposed composable pools problem you've posed. A proper interprocess scheduler is required to handle this ideally. (See GCD, and runtime implementations that provide at least some userspace scheduling such as Go, however poor it may be). Secondly, composable pools don't handle recursive relationships well. If a thread in one pool depends on the completion of all the tasks in its own pool to complete before it can itself complete, you'll have deadlock. Personally if I implemented a composable thread pool I'd have it global, creation and submission of tasks would be proxied to it via some composable executor class. As it stands, thread pools are best for task-oriented concurrency rather than parallelism anyway, especially in CPython. In short, I think composable thread pools are a hack at best and won't gain you anything except a slightly reduced threading overhead. If you want optimal utilization, threading isn't the right place to be looking.

On May 21, 2012, at 9:17 AM, Matt Joiner wrote:
I agree completely. Maybe the implementation I described was overly hacky for the sake of transparent compatibility with the existing (non-composable) executors in concurrent.futures. Ideally, the system would have one global pool which many concurrency APIs -- not just concurrent.futures -- could potentially share. (In a *really* ideal world, the OS would provide thread pool management -- like GCD, which you mentioned, or scheduler activations. But a cross-platform library currently requires a less ambitious solution.)
To be clear, I meant to refer to processes *or* threads when discussing the problem originally. The ProcessPoolExecutor is pretty useful (in my experience) for easily getting speedup even on pure-Python CPU-bound workloads. Adrian

FWIW that wasn't the default "use processes" spike. In my experience toying with concurrency in Python, trying to manage the load threads put on the system always ends badly. The 2 best supported concurrency mechanisms, threads and processes are constantly tête-à-tête, neither are adequate when you start to consider extreme concurrency scenarios. I suggest this because if you're considering composing executors, you're already trying to reduce the overhead (wastage) that processes and threads are incurring on your system for these purposes.

It's really up to individual libraries to make it possible for applications to provide the executor explicitly, rather than the library assuming it's OK to just create its own. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (3)
-
Adrian Sampson
-
Matt Joiner
-
Nick Coghlan