[Python-ideas] Concurrency Modules

Sun Jul 26 23:54:14 CEST 2015

Big thanks to you, Andrew, Nick and Nikolaus for the latest comments and 
ideas.

I think the table is in a very good shape now and the questions I 
started this thread with are now answered (at least) to my satisfaction. 
The relationships are clear (they are all different modules for the same 
overall purpose), they have different fields of application (cpu vs io) 
and they have slightly different properties.

How do we proceed from here?

Btw. the number of different approaches (currently 3, but I assume this 
will go up in the future) is quite unfortunate. What's even more 
unfortunate is the missing exchangeability due to API differences and a 
common syntax for executing functions concurrently.

Something that struck me as odd was that asyncio got syntactic sugar 
although the module itself is actually quite young compared to the 
support of processes and of threads. These two alternatives have 
actually no a single bit of syntax support until now.

On 26.07.2015 17:00, Andrew Barnert wrote:
> On Jul 26, 2015, at 13:44, Sven R. Kunze <srkunze at mail.de> wrote:
>> Wow. Thanks, Andrew for this very informative response. I am going to integrate your thoughts in to the table later and re-post it again.
>>
>> Just one question:
>>
>>> On 26.07.2015 12:29, Andrew Barnert wrote:
>>> It's your choice: just fork, spawn (fork+exec), or spawn a special "server" process to fork copies off. (Except on Windows, where spawn is the only possibility.)
>>>
>>> How do you know which one to choose? Well, you have to learn the differences to make a decision. Forking is fastest, and it means some kinds of globals are automatically shared, but it can lead to a variety of problems, especially if you're also using threads (and some libraries may use threads without you knowing about it--especially on OS X, where a variety of Cocoa APIs sometimes use threads and sometimes don't).
>> If I read the documentation of https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool for instance, I do not see a way to specify my choice.
> That's because you're reading the documentation for Python 2.7. In 2.7, you always get fork on Unix and spawn on Windows; the choice of start methods was added in 3.3 or 3.4.
>> There, I pass a function and this function is executed in another process/thread. Is that just forking?
> If you pass a function to a Process in 2.7, on Unix, that's just forking; the parent process returns while the child process calls your function and exits. If you pass it to a Pool, all the pool processes are forked, but they keep running and pick new tasks off a queue.
>
> On Windows, on the other hand, a new Process calls CreateNewProcess (the equivalent of fork then exec, or posix_spawn, on Unix) to launch an entirely new Python interpreter, which then imports your module and calls your function. With a Pool, all the new processes get started the same way, then keep running and pick new tasks off a queue.