[Python-ideas] Concurrency Modules
Steve.Dower at microsoft.com
Sat Jul 11 01:42:17 CEST 2015
Sven R. Kunze wrote:
> 1) I do not know what its purposes really is COMPARED to all the other modules;
> that really needs clarification first before anything else
> 2) sometimes, I get the feeling people understand it as a third way to do
> concurrency (along with processes and threads) but then Guido and others tell me
> it makes no sense to use asyncio for stuff that can be done with threading or
I'm going to dive into an analogy here. Hopefully it holds up better than most...
Let's say you are making a cake. There are two high-level steps involved:
1. Gather all the ingredients
2. Mix all the ingredients
3. Bake it in the oven
You are personally required to do steps 1 and 2 ("hands-on"). They takes all of your time and attention and you can't do anything else simultaneously.
For step 3, you hand off the work to the oven. While the oven is baking, you are basically free to do other things.
In this analogy, "you" are the main thread and the oven is another thread. (Thread and process are interchangeable here in the general sense - the GIL in Python is practicality that makes processes preferable, but that doesn't affect the concepts.) Steps 1 and 2 are CPU bound (as far as "you" the main thread are concerned), and step 3 is IO bound from "your" (the main thread's) point-of-view.
Step 3 requires you to wait until it is complete:
* You can do a synchronous wait, by sitting and staring at the oven until it's done.
* You can poll, by occasionally interrupting yourself to walk over to the oven and see if it's done yet.
* You can use a signal/interrupt, where the oven is going to make some noise and interrupt you when you're ready
(but note: you know that the oven is done without having to walk over and check it).
* Or you can use asyncio, where you occasionally interrupt yourself and, when you do, the oven will make some noise if it has finished.
(and if you never interrupt yourself, the oven never makes a sound)
This last option is most efficient for you, because you aren't interrupted at awkward times (i.e. greatly reduced need for locking on shared state) but you also don't have to walk all the way over to the oven to check whether it is done. You pause, listen, and get straight back to work if the oven is still going. That's the core feature of asyncio - not the networking or subprocess support - the ability to be notified efficiently that a task is complete without being interrupted by that notification.
Now let's expand this to making 3 cakes in parallel to see how "parallelism" works. Since there's so much going on, we'll create a TODO list:
1. Make cake #1
2. Make cake #2
3. Make cake #3
(This means we've started three tasks to the current event loop. It's likely these are three external requests from clients, such as HTTP requests. It is possible, though not common in my experience, for production software to explicitly start with multiple tasks like this. More common is to have one task and a UI event loop that injects UI events as necessary.)
Task 1 is the obvious place to start, so we take that off the TODO list and start working on it. The steps to make cake #1 are:
* Gather ingredients for cake #1
* Mix ingredients for cake #1
* Bake cake #1
Gathering ingredients is a synchronous operation (`def gather_ingredients()`) so we do that until we've gathered everything.
Mixing ingredients is a long, interruptible operation (`async def mix_ingredients()`, with occasional explicit `await yield()` or whatever syntax was chosen for this), so we start mixing and then pause. When we pause, we put our current task on the TODO list:
1. Make cake #2
2. Make cake #3
3. Continue mixing cake #1
We see that our next task is to make cake #2, so we repeat the steps above and eventually pause while we're mixing. Now the TODO list looks like:
1. Make cake #3
2. Continue mixing cake #1
3. Continue mixing cake #2
And this continues. (Note that selecting which task to continue with is a detail of the event loop you're using. Check the spec to see whether some tasks have a higher priority or what order tasks are continued in. And bear in mind that so far, we've only used explicit yields - "I'm ready to do something else now if something needs doing".)
Eventually we will finish mixing one of the cakes, let's say it's cake #1. We will put it in the oven (`await put_in_oven()`) and then check the TODO list for what we should do next. There's nothing for us to do with cake #1, so our TODO list looks like:
1. Continue mixing cake #2
2. Continue mixing cake #3
Eventually, the oven will finish baking cake #1 and will add its own item to the TODO list:
1. Continue mixing cake #2
2. Continue mixing cake #3
3. Cake #1 is ready
When we take a break from mixing cake #2, we will continue mixing cake #3 (again, depending on your event loop's policy with regards to prioritisation). When we take a break from mixing cake #3, "Cake #1 is ready" will be the top of our TODO list and so we will continue with the statement following where we awaited it (it probably looked like `await put_in_oven(); remove_from_oven()` or maybe `baked_cake = await put_in_oven(mixed_ingredients)`).
Eventually our TODO list will be empty, and so we will sit there waiting for something to appear on it (such as another incoming request, or an oven adding a "remove cake" item).
Processes and threads only really enter into asyncio as a "thing that can post messages back to my TODO list/event loop", while asyncio provides an efficient mechanism for interleaving (not parallelising) multiple tasks throughout an entire application (or a very significant self-contained piece of it). The parallelism only comes when all the main thread has to do for a particular task is wait, because another thread/process/service/device/etc. is doing the actual work.
Hopefully that helps clear things up for some people. No example is perfect for everyone, ultimately, so the more we put out there the more likely
More information about the Python-ideas