Hi, A few days ago I posted a message [1] on python-ideas to which no one replied except Brett who suggested re-posting it here, which I do now. The following is a complete rewrite/update of the original message. I promise: I will stop the spamming if no one replies this time! Async programming with executors seems to be a useful approach for number-crunching problems where the main algorithm (the one launching the futures) is quite complicated and possibly concurrent, but light enough to run on a single core (the heavy lifting is done “inside” the futures). This use case is addressed by asyncio's run_in_executor that uses the executors provided by the concurrent.futures package. There is, however, a lot of duplication of functionality between concurrent.futures and asyncio. (The main difference is that concurrent.futures relies on threading and locks.) If one is going to use concurrent.futures only from coroutines that run under the control of the asyncio event loop, it should be possible to replace the concurrent.futures executors by async versions of them. Why could such an async executor be interesting? • It can be a lot simpler (and thus easier to understand and extend) because it builds on top of asyncio. There have been already discussions on unifying the Future type from concurrent.futures with that of asyncio [2]. • Performance is better, especially when many worker processes are involved, because of the simpler code that uses less locking (the only locking that remains is inside the multiprocessing module). As far as I can see the original concurrent.futures.ProcessPoolExecutor has no advantage when used in asyncio-based programs except when some coroutine blocks for long enough for the call queue to become empty. (But hey, async programming is about non-blocking coroutines!) Are there other possible advantages or disadvantages? Based on concurrent.futures.ProcessPoolExecutor, I’ve made a proof-of-concept implementation [3] that demonstrates that the idea works. (There have been some improvements compared to the version that I posted on python-ideas.) Christoph PS. I would be grateful if any asyncio experts could have a look at the part of the main loop of the process management coroutine where the coroutine awaits new data [4]. Currently, I am using an asyncio.Event that is set by a callback via asyncio’s add_reader(). Is this the most natural way to do it currently? I’m particularly puzzled by the fact that reader.poll(0) is not always true after the event has been set. If there is no better way to combine asyncio and multiprocessing.Process, perhaps this is a thing that could be improved in a future version of the Python stdlib? [1] http://thread.gmane.org/gmane.comp.python.ideas/40709 [2] http://thread.gmane.org/gmane.comp.python.ideas/24551/focus=24557 [3] https://gitlab.kwant-project.org/cwg/aexecutor [4] https://gitlab.kwant-project.org/cwg/aexecutor/blob/9290504e779c543d9ee93adc...
Hi Christoph, [..]
• Performance is better, especially when many worker processes are involved, because of the simpler code that uses less locking (the only locking that remains is inside the multiprocessing module).
Yes, it will be better, but the performance improvements will be visible only for 100s or 1000s of processes. And only if they are executing short-living tasks. Is there a real need for a bit faster process pool? [..]
Are there other possible advantages or disadvantages?
1. Backwards compatibility? We’ll have to continue to support concurrent.futures executors. 2. More code in asyncio core -> more work for alternative event loop implementations such as uvloop. 3. Stability. concurrent.futures is mature and well tested.
Based on concurrent.futures.ProcessPoolExecutor, I’ve made a proof-of-concept implementation [3] that demonstrates that the idea works. (There have been some improvements compared to the version that I posted on python-ideas.)
I think you should stabilize the code base, and release it as a module on PyPI. [..]
I would be grateful if any asyncio experts could have a look at the part of the main loop of the process management coroutine where the coroutine awaits new data [4]. Currently, I am using an asyncio.Event that is set by a callback via asyncio’s add_reader(). Is this the most natural way to do it currently?
I think you can use asyncio.Future for resuming that coroutine. Thanks, Yury
Hi Yury, Thanks for your insightful reply.
• Performance is better, especially when many worker processes are involved, because of the simpler code that uses less locking (the only locking that remains is inside the multiprocessing module).
Yes, it will be better, but the performance improvements will be visible only for 100s or 1000s of processes. And only if they are executing short-living tasks.
Is there a real need for a bit faster process pool?
You are right of course, 1000s of processes is not a realistic application of concurrent.futures.ProcessPoolExecutor on a normal computer. When I started playing with async executor I had computing clusters in mind. There exist several concurrent.futures-like libraries that work with computing clusters and support 1000s of worker processes (e.g. ipyparallel, distribute, SCOOP). Several of these packages use async programming (e.g. tornado or greenlet) for their internals (schedulers, controlers, etc.), but the futures that they provide use locking, just like that of concurrent.futures. In order to estimate whether an async executor could be useful for cluster-like workloads, I did some tests on my laptop with many worker processes that do mostly nothing. The result is that using asyncio's run_in_executor allows to process up to 1000 tasks per second. Using aexecutor, this number grows to 5000. These two numbers seem reasonably robust when, for example, the number of workers is varied. That factor 5 difference is not enormous, but perhaps there's some potential for improvement (optimization, curio?). Let's try a real-world check: I am often using a cluster of 1000 cores, and each core is about 50% as fast as the core in my laptop. So run_in_executor() will be overcharged if the average task takes less than 2 seconds to run. This doesn't sound like a terrible restriction, one would certainly try to have tasks that run longer than that. But I can certainly imagine useful workloads where tasks run less than 2 seconds and the parameters and results of each task are only a few numbers at most, so that the communication bandwidth shouldn't be a limit.
Based on concurrent.futures.ProcessPoolExecutor, I’ve made a proof-of-concept implementation [3] that demonstrates that the idea works. (There have been some improvements compared to the version that I posted on python-ideas.)
I think you should stabilize the code base, and release it as a module on PyPI.
I will do that, once I have some confidence that there are no obvious blunders in it (see further below).
I would be grateful if any asyncio experts could have a look at the part of the main loop of the process management coroutine where the coroutine awaits new data [4]. Currently, I am using an asyncio.Event that is set by a callback via asyncio’s add_reader(). Is this the most natural way to do it currently?
I think you can use asyncio.Future for resuming that coroutine.
I'm not sure what you mean. In the current code (line in concurrent.futures), a multiprocessing.SimpleQueue instance is used to receive results from the workers. That queue has an attribute _reader._handle that is just a file descriptor. I use BaseEventLoop.add_reader() to add a callback for that file descriptor. The callback sets an asyncio.Event and the main loop of _queue_management_worker() waits for this event. How could I use asyncio.Future here? One problem with my current solution is that the event gets set also when there is no data in the queue yet. That's why I use the method poll() of the reader to verify if there's anything to be read and if there isn't, the event is cleared again and the waiting resumes. The spurious events could be an indication that there is a better way of solving the problem, hence my original question. Thanks, Christoph
On Jun 6, 2016, at 5:38 AM, Christoph Groth <christoph@grothesque.org> wrote:
As far as I can see the original concurrent.futures.ProcessPoolExecutor has no advantage when used in asyncio-based programs except when some coroutine blocks for long enough for the call queue to become empty. (But hey, async programming is about non-blocking coroutines!)
That’s incorrect. The ProcessPoolExecutor is a way for you to run CPU-intensive tasks without affecting the responsiveness of the main event loop in your program. As long as you keep your arguments small (pickling/unpickling big data structures is pretty detrimental to performance here) and the duration of the task is non-trivial, the child processes enable you to use more CPU cores, and since they have their own GILs, this doesn't affect your loop at all, unlike using a thread pool. I agree that the process pool is useless for a torrent of very fine-grained tasks where locking, futures and IPC are a significant fraction of the work to be done. But that doesn’t mean the process pool executor has no advantage. We’re using the PPE just fine at Facebook in cases where using TPE was being affected by the GIL. -- Lukasz Langa | Facebook Production Engineer | The Ministry of Silly Walks (+1) 650-681-7811
Łukasz Langa wrote:
As far as I can see the original concurrent.futures.ProcessPoolExecutor has no advantage when used in asyncio-based programs except when some coroutine blocks for long enough for the call queue to become empty. (But hey, async programming is about non-blocking coroutines!)
That’s incorrect. The ProcessPoolExecutor is a way for you to run CPU-intensive tasks without affecting the responsiveness of the main event loop in your program. As long as you keep your arguments small (pickling/unpickling big data structures is pretty detrimental to performance here) and the duration of the task is non-trivial, the child processes enable you to use more CPU cores, and since they have their own GILs, this doesn't affect your loop at all, unlike using a thread pool.
I was comparing the (possible) advantages of the traditional concurrent.futures.ProcessPoolExecutor with the new aexecutor.ProcessPoolExecutor. Both use processes for the workers, so the GIL is not a problem for either. Christoph
On Jun 6, 2016, at 5:38 AM, Christoph Groth <christoph@grothesque.org> wrote:
Why could such an async executor be interesting?
Yes, definitely interesting, please release it as a package on PyPI. It’s hard to say whether it’s going to be integrated with a future version or asyncio but releasing it independently and announcing it is the easiest way to learn. I looked at your proposed changes on python-ideas and they look good. Having this up on GitHub with unit tests would be a great start of the conversation. We might be able to gather some real world data from using the builtin PPE and the pure asyncio one. -- Lukasz Langa | Facebook Production Engineer | The Ministry of Silly Walks (+1) 650-681-7811
participants (3)
-
Christoph Groth
-
Yury Selivanov
-
Łukasz Langa