data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
Hi everybody, well during the discussion of the concurrency capabilities of Python, I found this article reading worthwhile: http://chriskiehl.com/article/parallelism-in-one-line/ His statement "Mmm.. Smell those Java roots." basically sums the whole topic up for me. That is sequential code (almost plain English): for image in images: create_thumbnail(image) In order to have a start with parallelism and concurrency, we need to do the following: pool = Pool() pool.map(create_thumbnail, images) pool.close() pool.join() Not bad (considering the other approaches), but why couldn't it not look just like the sequential one, maybe like this: for image in images: fork create_thumbnail(image) What I like about the Pool concept is that it frees me of thinking about the interprocess/-thread communication and processes/threads management (not sure how this works with coroutines, but the experts of you do know). What I would like to be freed of as well is: pool management. It actually reminds me of languages without garbage-collection. Regards, Sven
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jul 29, 2015, at 18:46, Sven R. Kunze <srkunze@mail.de> wrote:
No you don't, because this isn't Java: with Pool() as pool: pool.map(create_thumbnail, images) Also note that if create_thumbnail returns a value, to assemble all the values into a list is just as simple: with Pool() as pool: thumbnails = pool.map(create_thumbnail, images) And if you want to iterate over the thumbnails as created and don't care about the order, you can just replace the map with imap_unordered. (Or, of course, you can use an executor and just iterate as_completed on the list of futures.)
To me, this strongly implies that you're actually forking a new child process (or at least a new thread) for every thumbnail. Which is probably a really bad idea if you have, say, 1000 of them. It definitely doesn't say "find/create some implicit pool somewhere, wrap this in a task, and submit it to the pool". And I'm not sure I'd want it to. What if I want to use a process pool instead of a thread pool, or to use a pool of 12 threads instead of NCPU because I know I'm mostly waiting on a particular HTTP server farm and 12x concurrency is ideal? Also, how would you extend this to return results? A statement can't have a result. And, even if this were an expression, it would look pretty ugly to do this: thumbnails = [] for image in images: thumbnails.append(fork create_thumbnail(image)) Or: thumbnails = [fork create_thumbnail(image) for image in images] And, even if you liked the look of that, what exactly could thumbnails be? Obviously not a list of thumbnails. At best, a list of futures that you'd then still have to loop over with as_completed or similar. Of course you could design a new language with implicit futures built into the core (or, even better, a two-level variable store with dataflow variables and implicit blocking) to solve this, but it would be very different from Python semantics.
What I like about the Pool concept is that it frees me of thinking about the interprocess/-thread communication and processes/threads management (not sure how this works with coroutines, but the experts of you do know).
What I would like to be freed of as well is: pool management. It actually reminds me of languages without garbage-collection.
That's a good parallel--but that's exactly what's so nice about "with Pool() as pool:". When you need a pool to be deterministically managed, this is the nicest syntax in any language to do it (except maybe C++ with its RAII, which lets you hide deterministic destruction inside wrapper objects). It's hard to see how it could be any more minimal. After all, if you don't wait on the pool to finish, and you don't collect a bunch of futures to wait on, how do you know when all the thumbnails are created?
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 30 July 2015 at 03:00, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
asyncio offers a persistent thread-or-process pool as part of the event loop (defaulting to a thread pool). Using the call_in_background() helper from http://www.curiousefficiency.org/posts/2015/07/asyncio-tcp-echo-server.html, you can write: for image in images: call_in_background(create_thumbnail, image) And if you actually want to do something with the thumbnails: futures = [call_in_background(create_thumbnail, image) for image in images] for thumbnail in run_in_foreground(asyncio.gather(futures)): # Do something with the thumbnail Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
On 7/29/2015 12:46 PM, Sven R. Kunze wrote:
I found this very helpful.
Write this more succinctly as map(create_thumbnail, images)
and define def pmap(func, iterable, *args, **kwargs): pool = Pool(*args, **kwargs) pool.map(func, iterable) pool.close() pool.join() then the replacement requires only 1 char. pmap(create_thumbnail, images) This is, of course, limited to making exactly one .map call and closing, but if this is the common case, it might be sensible to request that this be added to multiprocessing (and m.dummy) as a utility function.
An new keyword, which is a pain it itself, cannot take arguments. Blogger Chris Kiehl why they are needed.
What I like about the Pool concept is that it frees me of thinking about the interprocess/-thread communication and processes/threads management
A keyword would not offer the choice of threads versus processes.
What I would like to be freed of as well is: pool management.
Then use the wrapper function above. -- Terry Jan Reedy
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jul 29, 2015, at 18:46, Sven R. Kunze <srkunze@mail.de> wrote:
No you don't, because this isn't Java: with Pool() as pool: pool.map(create_thumbnail, images) Also note that if create_thumbnail returns a value, to assemble all the values into a list is just as simple: with Pool() as pool: thumbnails = pool.map(create_thumbnail, images) And if you want to iterate over the thumbnails as created and don't care about the order, you can just replace the map with imap_unordered. (Or, of course, you can use an executor and just iterate as_completed on the list of futures.)
To me, this strongly implies that you're actually forking a new child process (or at least a new thread) for every thumbnail. Which is probably a really bad idea if you have, say, 1000 of them. It definitely doesn't say "find/create some implicit pool somewhere, wrap this in a task, and submit it to the pool". And I'm not sure I'd want it to. What if I want to use a process pool instead of a thread pool, or to use a pool of 12 threads instead of NCPU because I know I'm mostly waiting on a particular HTTP server farm and 12x concurrency is ideal? Also, how would you extend this to return results? A statement can't have a result. And, even if this were an expression, it would look pretty ugly to do this: thumbnails = [] for image in images: thumbnails.append(fork create_thumbnail(image)) Or: thumbnails = [fork create_thumbnail(image) for image in images] And, even if you liked the look of that, what exactly could thumbnails be? Obviously not a list of thumbnails. At best, a list of futures that you'd then still have to loop over with as_completed or similar. Of course you could design a new language with implicit futures built into the core (or, even better, a two-level variable store with dataflow variables and implicit blocking) to solve this, but it would be very different from Python semantics.
What I like about the Pool concept is that it frees me of thinking about the interprocess/-thread communication and processes/threads management (not sure how this works with coroutines, but the experts of you do know).
What I would like to be freed of as well is: pool management. It actually reminds me of languages without garbage-collection.
That's a good parallel--but that's exactly what's so nice about "with Pool() as pool:". When you need a pool to be deterministically managed, this is the nicest syntax in any language to do it (except maybe C++ with its RAII, which lets you hide deterministic destruction inside wrapper objects). It's hard to see how it could be any more minimal. After all, if you don't wait on the pool to finish, and you don't collect a bunch of futures to wait on, how do you know when all the thumbnails are created?
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 30 July 2015 at 03:00, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
asyncio offers a persistent thread-or-process pool as part of the event loop (defaulting to a thread pool). Using the call_in_background() helper from http://www.curiousefficiency.org/posts/2015/07/asyncio-tcp-echo-server.html, you can write: for image in images: call_in_background(create_thumbnail, image) And if you actually want to do something with the thumbnails: futures = [call_in_background(create_thumbnail, image) for image in images] for thumbnail in run_in_foreground(asyncio.gather(futures)): # Do something with the thumbnail Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
On 7/29/2015 12:46 PM, Sven R. Kunze wrote:
I found this very helpful.
Write this more succinctly as map(create_thumbnail, images)
and define def pmap(func, iterable, *args, **kwargs): pool = Pool(*args, **kwargs) pool.map(func, iterable) pool.close() pool.join() then the replacement requires only 1 char. pmap(create_thumbnail, images) This is, of course, limited to making exactly one .map call and closing, but if this is the common case, it might be sensible to request that this be added to multiprocessing (and m.dummy) as a utility function.
An new keyword, which is a pain it itself, cannot take arguments. Blogger Chris Kiehl why they are needed.
What I like about the Pool concept is that it frees me of thinking about the interprocess/-thread communication and processes/threads management
A keyword would not offer the choice of threads versus processes.
What I would like to be freed of as well is: pool management.
Then use the wrapper function above. -- Terry Jan Reedy
participants (4)
-
Andrew Barnert
-
Nick Coghlan
-
Sven R. Kunze
-
Terry Reedy