Mailman 3 Concurrency Modules - Python-ideas

Concurrency Modules

Sven R. Kunze

July 10, 2015

12:53 a.m.

Hi, that's a follow up on the discussion started on python-dev ('The importance of the async keyword') and this issue http://bugs.python.org/issue24571 . After discussing the whole topic and reading it up further, it became clear to me what's actually missing in Python. That is a definitive guide of why/when a certain concurrency module is supposed to be used: Currently, I can name 4 modules of which I know that they more or less deal with the topic: - concurrent - threading - asyncio - multiprocessing In order to make a sound decision for the question: "Which one(s) do I use?", at least the following items should be somehow defined clearly for these modules: 1) relationship between the modules 2) NON-overlapping usage scenarios 3) future development intentions 4) ease of usage of the modules => future syntax 5) examples Remarks to the items: 1) For the basic understanding Do they complement each other? Differences in behavior? Do they overlap from the perspective of the programmer? They mostly do not care about internal details; they need to get things done (threads <-> processes) as long as the result is the same. 2) Extremely important to make the decision fast 3) Will asyncio incorporate all concepts of the other modules in a seamless way? Or are they just complementary? 4) Closely related to 3) 5) Maybe in close correlation with 2) and 1) Cheers, Chuck

Show replies by date

Giampaolo Rodola'

July 2015

1:51 a.m.

On Fri, Jul 10, 2015 at 12:53 AM, Sven R. Kunze <srkunze@mail.de> wrote:

...

Hi,

that's a follow up on the discussion started on python-dev ('The importance of the async keyword') and this issue http://bugs.python.org/issue24571 .

After discussing the whole topic and reading it up further, it became clear to me what's actually missing in Python. That is a definitive guide of why/when a certain concurrency module is supposed to be used:

Currently, I can name 4 modules of which I know that they more or less deal with the topic: - concurrent - threading - asyncio - multiprocessing

+1 on the overall idea. Technically there's also asyncore and asynchat but they are deprecated. It might also be worth it to add a section listing the main third-party modules (twisted, tornado, gevent comes to mind). -- Giampaolo - http://grodola.blogspot.com

Terry Reedy

4:01 a.m.

On 7/9/2015 7:51 PM, Giampaolo Rodola' wrote:

...

On Fri, Jul 10, 2015 at 12:53 AM, Sven R. Kunze <srkunze@mail.de <mailto:srkunze@mail.de>> wrote:

Hi,

that's a follow up on the discussion started on python-dev ('The importance of the async keyword') and this issue http://bugs.python.org/issue24571 .

After discussing the whole topic and reading it up further, it became clear to me what's actually missing in Python. That is a definitive guide of why/when a certain concurrency module is supposed to be used:

Currently, I can name 4 modules of which I know that they more or less deal with the topic: - concurrent - threading - asyncio - multiprocessing

+1 on the overall idea. Technically there's also asyncore and asynchat but they are deprecated.

They should be listed as deprecated, with pointers to what superceded them.

...

It might also be worth it to add a section listing the main third-party modules (twisted, tornado, gevent comes to mind).

-- Terry Jan Reedy

Ian Lee

3:06 a.m.

On Thursday, July 9, 2015, Sven R. Kunze <srkunze@mail.de> wrote:

...

Hi,

that's a follow up on the discussion started on python-dev ('The importance of the async keyword') and this issue http://bugs.python.org/issue24571 .

After discussing the whole topic and reading it up further, it became clear to me what's actually missing in Python. That is a definitive guide of why/when a certain concurrency module is supposed to be used:

Currently, I can name 4 modules of which I know that they more or less deal with the topic: - concurrent - threading - asyncio - multiprocessing

In order to make a sound decision for the question: "Which one(s) do I use?", at least the following items should be somehow defined clearly for these modules:

1) relationship between the modules 2) NON-overlapping usage scenarios 3) future development intentions 4) ease of usage of the modules => future syntax 5) examples

+1 and also with regard specifically to the examples where there are overlap between different modules, equivalent approaches to performing some task.

...

Remarks to the items:

1) For the basic understanding Do they complement each other? Differences in behavior? Do they overlap from the perspective of the programmer? They mostly do not care about internal details; they need to get things done (threads <-> processes) as long as the result is the same.

2) Extremely important to make the decision fast

3) Will asyncio incorporate all concepts of the other modules in a seamless way? Or are they just complementary?

4) Closely related to 3)

5) Maybe in close correlation with 2) and 1)

Cheers, Chuck _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- ~ Ian Lee | IanLee1521@gmail.com

Chris Angelico

4:09 a.m.

On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze <srkunze@mail.de> wrote:

...

After discussing the whole topic and reading it up further, it became clear to me what's actually missing in Python. That is a definitive guide of why/when a certain concurrency module is supposed to be used

I'm not sure how easy the decisions will be in all cases, but certainly some broad guidelines would be awesome. (The exact analysis of "when should I use threads and when should I use processes" is a big enough one that there've been a few million blog posts on the subject, and I doubt that asyncio will shrink that.) A basic summary would be hugely helpful. "Here's four similar modules, and why they all exist in the standard library." ChrisA

Nick Coghlan

9:18 a.m.

On 10 July 2015 at 12:09, Chris Angelico <rosuav@gmail.com> wrote:

...

On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze <srkunze@mail.de> wrote:

...
After discussing the whole topic and reading it up further, it became clear to me what's actually missing in Python. That is a definitive guide of why/when a certain concurrency module is supposed to be used

I'm not sure how easy the decisions will be in all cases, but certainly some broad guidelines would be awesome. (The exact analysis of "when should I use threads and when should I use processes" is a big enough one that there've been a few million blog posts on the subject, and I doubt that asyncio will shrink that.) A basic summary would be hugely helpful. "Here's four similar modules, and why they all exist in the standard library."

Q: Why are there four different modules A: Because they solve different problems Q: What are those problems? A: How long have you got? Choosing an appropriate concurrency model for a problem is one of the hardest tasks in software architecture design. The only way to make it appear simple is to focus in on a specific class of problems where there *is* a single clearly superior answer for that problem domain :) That said, I think there may be a way to make the boundary between synchronous and asynchronous execution easier to conceptualise, so I'll put up a thread about that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Sven R. Kunze

10:45 p.m.

That'll be great, Nick. I look forward to your proposal. Alongside with you proposal, there might be capable guys who would like to contribute on the questions I raised in my initial mail on this list. This also might help Nick to hammer out a good proposal. On 10.07.2015 09:18, Nick Coghlan wrote:

...

On 10 July 2015 at 12:09, Chris Angelico <rosuav@gmail.com> wrote:

...
On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze <srkunze@mail.de> wrote:

...
After discussing the whole topic and reading it up further, it became clear to me what's actually missing in Python. That is a definitive guide of why/when a certain concurrency module is supposed to be used I'm not sure how easy the decisions will be in all cases, but certainly some broad guidelines would be awesome. (The exact analysis of "when should I use threads and when should I use processes" is a big enough one that there've been a few million blog posts on the subject, and I doubt that asyncio will shrink that.) A basic summary would be hugely helpful. "Here's four similar modules, and why they all exist in the standard library." Q: Why are there four different modules A: Because they solve different problems Q: What are those problems? A: How long have you got?

Choosing an appropriate concurrency model for a problem is one of the hardest tasks in software architecture design. The only way to make it appear simple is to focus in on a specific class of problems where there *is* a single clearly superior answer for that problem domain :)

That said, I think there may be a way to make the boundary between synchronous and asynchronous execution easier to conceptualise, so I'll put up a thread about that.

Cheers, Nick.

Nikolaus Rath

5 p.m.

On Jul 10 2015, Nick Coghlan <ncoghlan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

...

On 10 July 2015 at 12:09, Chris Angelico <rosuav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

...
On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze <srkunze-7y4VAllY4QU@public.gmane.org> wrote:

...
After discussing the whole topic and reading it up further, it became clear to me what's actually missing in Python. That is a definitive guide of why/when a certain concurrency module is supposed to be used

I'm not sure how easy the decisions will be in all cases, but certainly some broad guidelines would be awesome. (The exact analysis of "when should I use threads and when should I use processes" is a big enough one that there've been a few million blog posts on the subject, and I doubt that asyncio will shrink that.) A basic summary would be hugely helpful. "Here's four similar modules, and why they all exist in the standard library."

Q: Why are there four different modules A: Because they solve different problems Q: What are those problems? A: How long have you got?

Choosing an appropriate concurrency model for a problem is one of the hardest tasks in software architecture design. The only way to make it appear simple is to focus in on a specific class of problems where there *is* a single clearly superior answer for that problem domain :)

But even just documenting this subset would already provide a lot of improvement over the status quo. If for each module there were an example of a problem that's clearly best solved with this module rather than any of the others, that's a perfectly good anwser to the question why they all exist. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Andrew Barnert

10:56 p.m.

On Jul 11, 2015, at 08:00, Nikolaus Rath <Nikolaus@rath.org> wrote:

...

...
On Jul 10 2015, Nick Coghlan <ncoghlan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

...
On 10 July 2015 at 12:09, Chris Angelico <rosuav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

...
On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze <srkunze-7y4VAllY4QU@public.gmane.org> wrote: After discussing the whole topic and reading it up further, it became clear to me what's actually missing in Python. That is a definitive guide of why/when a certain concurrency module is supposed to be used

I'm not sure how easy the decisions will be in all cases, but certainly some broad guidelines would be awesome. (The exact analysis of "when should I use threads and when should I use processes" is a big enough one that there've been a few million blog posts on the subject, and I doubt that asyncio will shrink that.) A basic summary would be hugely helpful. "Here's four similar modules, and why they all exist in the standard library."

Q: Why are there four different modules A: Because they solve different problems Q: What are those problems? A: How long have you got?

Choosing an appropriate concurrency model for a problem is one of the hardest tasks in software architecture design. The only way to make it appear simple is to focus in on a specific class of problems where there *is* a single clearly superior answer for that problem domain :)

But even just documenting this subset would already provide a lot of improvement over the status quo.

If for each module there were an example of a problem that's clearly best solved with this module rather than any of the others, that's a perfectly good anwser to the question why they all exist.

Assuming coroutines/asyncio are not the answer for your problem, it's not really a choice between 3 modules; rather, there are 3 separate binary decisions to make, which lead to 6 different possibilities (not 8, because 2 of them are less useful and therefore Python doesn't have them): futures.ProcessPoolExecutor, futures.ThreadPoolExecutor, multiprocessing.Pool, multiprocessing.dummy.Pool (unfortunately, this is where thread pools lie...), multiprocessing.Process, or threading.Thread. Explaining pools vs. separate threads is pretty easy. If you're doing a whole bunch of similar things (download 1000 files, do this computation on every row of a giant matrix), you want pools; if you're doing distinctly different things (update the backup for this file, send that file to the printer, and download the updated version from the net), you don't. Explaining plain pools vs. executors is a little trickier, because for the simplest cases there's no obvious difference. Coming up with a case where you need to compose futures isn't that hard; coming up with a case where you need one of the lower-level pool features (like explicitly managing batching) without getting too artificial to be meaningful or too complicated to serve as an example is a bit harder. But still not that big of a problem. Explaining threads vs. processes is two questions in itself. First, if you're looking at concurrency to speed up your code, and your code is CPU-bound, then your answer to the other question doesn't matter; you need processes. (Unless you're using a C extension that release the GIL, or using Jython instead of CPython, or ...) So finally we get to the big problem: shared state. Even ignoring the Python- and CPython-specific issues (forking, what the GIL makes atomic, ...), just explaining the basic ideas of what shared state means, when you need it, why you're wrong, what races are, how to synchronize, why mutability matters... Is that really something that can be fit into a HOWTO? But if you punt on that and just say "until you know what you're doing, everything should be written in the message-passing-tasks style", you might as well skip the whole HOWTO and say "always use concurrent.futures.ProcessPoolExecutor".

Sven R. Kunze

10:57 p.m.

Seems like many people agree with the general idea of having a standard explanation and guideline of when to use which concurrency module. Nice! On the difference of threads and processes; it is an interesting topic (IMHO) but: *1) both processes and threads are just means to an end* 2) I would like to interchange them when necessary/one appears to be better than another 3) I would like to use the *same API* for both (for the major use cases), and switch from threads to processes if necessary or the other way round Regarding asyncio: 1) I do not know what its purposes really is COMPARED to all the other modules; that really needs clarification first before anything else 2) sometimes, I get the feeling people understand it as a third way to do concurrency (along with processes and threads) but then Guido and others tell me it makes no sense to use asyncio for stuff that can be done with threading or multiprocesses Let us see where these questions lead us. (the following two weeks, I will not able to contribute thoughts here, as am on a tour; I am curious of what the guys of python-ideas will post here :) ) Regards, Sven On 10.07.2015 04:09, Chris Angelico wrote:

...

On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze <srkunze@mail.de> wrote:

...
After discussing the whole topic and reading it up further, it became clear to me what's actually missing in Python. That is a definitive guide of why/when a certain concurrency module is supposed to be used I'm not sure how easy the decisions will be in all cases, but certainly some broad guidelines would be awesome. (The exact analysis of "when should I use threads and when should I use processes" is a big enough one that there've been a few million blog posts on the subject, and I doubt that asyncio will shrink that.) A basic summary would be hugely helpful. "Here's four similar modules, and why they all exist in the standard library."

ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Steve Dower

1:42 a.m.

Sven R. Kunze wrote:

...

1) I do not know what its purposes really is COMPARED to all the other modules; that really needs clarification first before anything else 2) sometimes, I get the feeling people understand it as a third way to do concurrency (along with processes and threads) but then Guido and others tell me it makes no sense to use asyncio for stuff that can be done with threading or multiprocesses

I'm going to dive into an analogy here. Hopefully it holds up better than most... Let's say you are making a cake. There are two high-level steps involved: 1. Gather all the ingredients 2. Mix all the ingredients 3. Bake it in the oven You are personally required to do steps 1 and 2 ("hands-on"). They takes all of your time and attention and you can't do anything else simultaneously. For step 3, you hand off the work to the oven. While the oven is baking, you are basically free to do other things. In this analogy, "you" are the main thread and the oven is another thread. (Thread and process are interchangeable here in the general sense - the GIL in Python is practicality that makes processes preferable, but that doesn't affect the concepts.) Steps 1 and 2 are CPU bound (as far as "you" the main thread are concerned), and step 3 is IO bound from "your" (the main thread's) point-of-view. Step 3 requires you to wait until it is complete: * You can do a synchronous wait, by sitting and staring at the oven until it's done. * You can poll, by occasionally interrupting yourself to walk over to the oven and see if it's done yet. * You can use a signal/interrupt, where the oven is going to make some noise and interrupt you when you're ready (but note: you know that the oven is done without having to walk over and check it). * Or you can use asyncio, where you occasionally interrupt yourself and, when you do, the oven will make some noise if it has finished. (and if you never interrupt yourself, the oven never makes a sound) This last option is most efficient for you, because you aren't interrupted at awkward times (i.e. greatly reduced need for locking on shared state) but you also don't have to walk all the way over to the oven to check whether it is done. You pause, listen, and get straight back to work if the oven is still going. That's the core feature of asyncio - not the networking or subprocess support - the ability to be notified efficiently that a task is complete without being interrupted by that notification. Now let's expand this to making 3 cakes in parallel to see how "parallelism" works. Since there's so much going on, we'll create a TODO list: 1. Make cake #1 2. Make cake #2 3. Make cake #3 (This means we've started three tasks to the current event loop. It's likely these are three external requests from clients, such as HTTP requests. It is possible, though not common in my experience, for production software to explicitly start with multiple tasks like this. More common is to have one task and a UI event loop that injects UI events as necessary.) Task 1 is the obvious place to start, so we take that off the TODO list and start working on it. The steps to make cake #1 are: * Gather ingredients for cake #1 * Mix ingredients for cake #1 * Bake cake #1 Gathering ingredients is a synchronous operation (`def gather_ingredients()`) so we do that until we've gathered everything. Mixing ingredients is a long, interruptible operation (`async def mix_ingredients()`, with occasional explicit `await yield()` or whatever syntax was chosen for this), so we start mixing and then pause. When we pause, we put our current task on the TODO list: 1. Make cake #2 2. Make cake #3 3. Continue mixing cake #1 We see that our next task is to make cake #2, so we repeat the steps above and eventually pause while we're mixing. Now the TODO list looks like: 1. Make cake #3 2. Continue mixing cake #1 3. Continue mixing cake #2 And this continues. (Note that selecting which task to continue with is a detail of the event loop you're using. Check the spec to see whether some tasks have a higher priority or what order tasks are continued in. And bear in mind that so far, we've only used explicit yields - "I'm ready to do something else now if something needs doing".) Eventually we will finish mixing one of the cakes, let's say it's cake #1. We will put it in the oven (`await put_in_oven()`) and then check the TODO list for what we should do next. There's nothing for us to do with cake #1, so our TODO list looks like: 1. Continue mixing cake #2 2. Continue mixing cake #3 Eventually, the oven will finish baking cake #1 and will add its own item to the TODO list: 1. Continue mixing cake #2 2. Continue mixing cake #3 3. Cake #1 is ready When we take a break from mixing cake #2, we will continue mixing cake #3 (again, depending on your event loop's policy with regards to prioritisation). When we take a break from mixing cake #3, "Cake #1 is ready" will be the top of our TODO list and so we will continue with the statement following where we awaited it (it probably looked like `await put_in_oven(); remove_from_oven()` or maybe `baked_cake = await put_in_oven(mixed_ingredients)`). Eventually our TODO list will be empty, and so we will sit there waiting for something to appear on it (such as another incoming request, or an oven adding a "remove cake" item). Processes and threads only really enter into asyncio as a "thing that can post messages back to my TODO list/event loop", while asyncio provides an efficient mechanism for interleaving (not parallelising) multiple tasks throughout an entire application (or a very significant self-contained piece of it). The parallelism only comes when all the main thread has to do for a particular task is wait, because another thread/process/service/device/etc. is doing the actual work. Hopefully that helps clear things up for some people. No example is perfect for everyone, ultimately, so the more we put out there the more likely Cheers, Steve

Ben Finney

5:10 a.m.

New subject: Concurrency HOWTO for Python (was: Concurrency Modules)

Steve Dower <Steve.Dower@microsoft.com> writes:

...

I'm going to dive into an analogy here. Hopefully it holds up better than most... […]

* You can do a synchronous wait, by sitting and staring at the oven until it's done.

* You can poll, by occasionally interrupting yourself to walk over to the oven and see if it's done yet.

* You can use a signal/interrupt, where the oven is going to make some noise and interrupt you when you're ready (but note: you know that the oven is done without having to walk over and check it).

* Or you can use asyncio, where you occasionally interrupt yourself and, when you do, the oven will make some noise if it has finished. (and if you never interrupt yourself, the oven never makes a sound)

[…]

...

Hopefully that helps clear things up for some people. No example is perfect for everyone, ultimately, so the more we put out there the more likely

Thank you, Steve! That is the clearest explanation of different concurrency models I've ever read. I now feel I have a firm understanding of how they're different and their relative merits. I hope that analogy can be worked into a putative “Concurrency HOWTO” at <URL:https://docs.python.org/3/howto/>. -- \ “If consumers even know there's a DRM, what it is, and how it | `\ works, we've already failed.” —Peter Lee, Disney corporation, | _o__) 2005 | Ben Finney

Nick Coghlan

10:13 a.m.

On 11 July 2015 at 09:42, Steve Dower <Steve.Dower@microsoft.com> wrote:

...

Processes and threads only really enter into asyncio as a "thing that can post messages back to my TODO list/event loop", while asyncio provides an efficient mechanism for interleaving (not parallelising) multiple tasks throughout an entire application (or a very significant self-contained piece of it). The parallelism only comes when all the main thread has to do for a particular task is wait, because another thread/process/service/device/etc. is doing the actual work.

Hopefully that helps clear things up for some people. No example is perfect for everyone, ultimately, so the more we put out there the more likely

I really like this example, so I had a go at expressing it in the foreground/background terms I use in http://www.curiousefficiency.org/posts/2015/07/asyncio-tcp-echo-server.html For folks that already know asyncio, the relevant semantic details of that proposal for this example are: run_in_foreground -> convenience wrapper for run_until_complete run_in_background(coroutine) -> convenience wrapper for ensure_future run_in_background(callable) -> convenience wrapper for run_in_executor I quite like the end result: # We'll need the concept of an oven class Oven: # There's a shared pool of ovens we can use @classmethod async def get_oven(): ... # An oven can only have one set of current settings def configure(self, recipe): ... # An oven can only cook one thing at a time def bake(self, mixture): ... # We stay focused on this task def gather_ingredients(recipe): ... return ingredients # Helper to indicate readiness to switch tasks def survey_the_kitchen(): return asyncio.sleep(0) # This task may be interleaved with other activities async def mix_ingredients(recipe, ingredients): mixture = CakeMixture(recipe) for ingredient in ingredients: mixture.add(ingredient) await survey_the_kitchen() return mixture # This task may be interleaved with other activities async def make_cake(recipe): # First, we gather and start mixing the ingredients ingredients = gather_ingredients(recipe) mixture = await mix_ingredients(recipe, ingredients) # We wait for a free oven, then configure it for our recipe oven = await Oven.get_oven() oven.configure(recipe) # Baking is synchronous for the *oven*, but *we* don't # want to sit around waiting for it the entire time bake_cake = functools.partial(oven.bake, mixture) return await run_in_background(bake_cake) # We have three cakes to make make_sponge = make_cake("sponge") make_madeira = make_cake("madeira") make_chocolate = make_cake("chocolate") # Which we'll try to do concurrently run_in_foreground(asyncio.wait([make_sponge, make_madeira, make_chocolate])) sponge_cake = make_sponge.result() madeira_cake = make_madeira.result() chocalate_cake = make_chocolate.result() Now, to upgrade this to full event driven programming: imagine you're modeling a professional bakery, accepting cake orders from customers. Then you would need to define a server process that turns orders from customers into cake making requests, and completed cake notifications into delivery orders, and your main thread becomes devoted to running that server, rather than specifying a pre-selected set of cakes to make. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Steve Dower

5:20 p.m.

Two minor corrections to my own post:

...

* You can use a signal/interrupt, where the oven is going to make some noise and interrupt you when you're ready

Should be "... when *the oven* is ready, regardless of whether you are ready to handle the interruption"

...

Hopefully that helps clear things up for some people. No example is perfect for everyone, ultimately, so the more we put out there the more likely

... we'll help everyone get a clear understanding of when and how to use these tools. Cheers, Steve

3517

Age (days ago)

3519

Last active (days ago)

List overview

Download

13 comments

10 participants

participants (10)

Andrew Barnert
Ben Finney
Chris Angelico
Giampaolo Rodola'
Ian Lee
Nick Coghlan
Nikolaus Rath
Steve Dower
Sven R. Kunze
Terry Reedy

Concurrency Modules

tags

participants (10)