[Python-ideas] Concurrency Modules

Sun Jul 26 12:29:01 CEST 2015

On Jul 26, 2015, at 12:07, Sven R. Kunze <srkunze at mail.de> wrote:
> 
> Thanks, Nikolaus. Mostly I refer to things Steve brought up in his analogies (two recent posts). So, I might interpreted them the wrong way.
> 
>> On 26.07.2015 02:58, Nikolaus Rath wrote:
>>> On Jul 25 2015, "Sven R. Kunze" <srkunze-7y4VAllY4QU at public.gmane.org> wrote:
>>> startup impact | biggest                 | medium                     | smallest
>>> cpu impact     | biggest                 | medium                     | smallest
>>> memory impact  | biggest                 | medium                     | smallest
>>> purpose        | cpu-bound tasks         | i/o-bound tasks            | ???
>> I don't think any of these is correct. Unfortunately, I also don't think
>> there even is a correct version, the differences are simply not so
>> clear-cut.
> I think that has already been discussed. We just try to boil it down to assist people making the decision of which module might be the best for them.

One huge thing you're missing is cooperative vs. preemptive switching. In asyncio, you know that no other task is going to run until you reach the next explicit yield point; with threads, it can happen after any bytecode; with processes, it can happen anywhere at all. This means that if you're using shared state, your locking strategy can be simpler, more efficient, and easier to prove correct with asyncio. And likewise, if you need to sequence things, it can be easier with asyncio (although often the simplest way to do that in any mechanism is to make each of those things into a task and just chain futures together).

>> On Unix, Process startup-cost can be high if you do fork() + exec(), but
>> if you just fork, it's as cheap as a thread.
> Didn't know that. Thanks for clarifying. How do multiprocessing.Pool and multiprocessing.Process work in this regard?

It's your choice: just fork, spawn (fork+exec), or spawn a special "server" process to fork copies off. (Except on Windows, where spawn is the only possibility.)

How do you know which one to choose? Well, you have to learn the differences to make a decision. Forking is fastest, and it means some kinds of globals are automatically shared, but it can lead to a variety of problems, especially if you're also using threads (and some libraries may use threads without you knowing about it--especially on OS X, where a variety of Cocoa APIs sometimes use threads and sometimes don't).

>> With asyncio, it's not
>> clear to me what exactly you'd define as the "startup impact" (the
>> creation of a future maybe? Or setting up the event loop?).
> The purpose of survey is to give developers an easy way to decide which approach might be suitable for them.
> So, the definition of 'startup time' should be roughly equivalent across the approaches. >> What's necessary to get a process up and running a piece of code compared to what's necessary to get asyncio up and running the same piece of code.
> 
> Steve: "Bakers aren't free, you have to pay for each one (memory, stack space), it will take time for each one to learn how your bakery works (startup time)"
>> "CPU impact" as a category doesn't make any sense to me. If you execute
>> the same code it's going to take the same amount of (cumulative) CPU
>> time, no matter if this code runs in a separate thread, separate
>> process, or asynchronously.
> From what I understand, switching contexts impacts cpu whereas the event loop does not so much.

Yes. There's always a context switch going on, but a cooperative context switch can swap a lot less, and can do it without having to cross the user-kernel boundary.

>> "memory impact" is probably highest for separate processes, but I don't
>> see an obvious difference when using threads vs asyncio. Where did you
>> get this from?
> I can imagine that when the os needs to manage threads it creates more overhead for each thread than what it takes for the Python interpreter when suspending coroutines. That could be wrong? Do you have any material on this?

The overhead for the contexts themselves is tiny--but one of the things each thread context points at is the stack, and that may be 1MB or even more. So, a program with 500 threads may be using half a GB just for stacks. That may not be as bad as it sounds, because if you never use most of the stack, most of it may never actually get paged to physical memory. (But on 32-bit OS's, you're still using up a quarter of your page table space.)

>> As far as purpose is concerned, pretty much the only limitation is that
>> asyncio is not suitable for cpu-bound tasks. Any other combination is
>> possible and also most appropriate in specific circumstances.
> What exactly do you mean by any other combination?
> 
> I take from this that asyncio is suitable for heavy i/o-bound, threads are for cpu/io-bound and processes for mainly cpu-bound.

Asyncio is best for massively concurrent i/o bound code that does pretty much the same thing for each one, like a web server that has to handle thousands of users. Threads are also used for i/o bound code; it's more a matter of how you want to write the code than of what it does.

Processes, on the other hand, are the only way (other than a C extension that releases the GIL--or, of course, using a different Python interpreter) to get CPU parallelism. So, that part is right. But there are other advantages of using processes sometimes--it guarantees no accidental shared state; it gives you a way to "recycle" your workers if you might call some C library that can crash or leak memory or corrupt things; it gives you another VM space (which can be a big deal in 32-bit platforms). Also, you can write multiprocessing code as if you were writing distributed code, which makes it easier to turn into real distributed code if you later need to do that.