
On Jul 26, 2015, at 13:44, Sven R. Kunze <srkunze@mail.de> wrote:
Wow. Thanks, Andrew for this very informative response. I am going to integrate your thoughts in to the table later and re-post it again.
Just one question:
On 26.07.2015 12:29, Andrew Barnert wrote: It's your choice: just fork, spawn (fork+exec), or spawn a special "server" process to fork copies off. (Except on Windows, where spawn is the only possibility.)
How do you know which one to choose? Well, you have to learn the differences to make a decision. Forking is fastest, and it means some kinds of globals are automatically shared, but it can lead to a variety of problems, especially if you're also using threads (and some libraries may use threads without you knowing about it--especially on OS X, where a variety of Cocoa APIs sometimes use threads and sometimes don't).
If I read the documentation of https://docs.python.org/2/library/multiprocessing.html#module-multiprocessin... for instance, I do not see a way to specify my choice.
That's because you're reading the documentation for Python 2.7. In 2.7, you always get fork on Unix and spawn on Windows; the choice of start methods was added in 3.3 or 3.4.
There, I pass a function and this function is executed in another process/thread. Is that just forking?
If you pass a function to a Process in 2.7, on Unix, that's just forking; the parent process returns while the child process calls your function and exits. If you pass it to a Pool, all the pool processes are forked, but they keep running and pick new tasks off a queue. On Windows, on the other hand, a new Process calls CreateNewProcess (the equivalent of fork then exec, or posix_spawn, on Unix) to launch an entirely new Python interpreter, which then imports your module and calls your function. With a Pool, all the new processes get started the same way, then keep running and pick new tasks off a queue.