On Fri, 10 Feb 2012 08:52:16 -0600 Massimo Di Pierro <massimo.dipierro@gmail.com> wrote:
Forking is a solution only for simple toy cases and in trivially parallel cases.
But threading is only a solution for simple toy cases and trivial levels of scaling.
People use processes to parallelize web serves and task queues where the tasks do not need to talk to each other (except with the parent/master process).
Only if they haven't thought much about using processes to build parallel systems. They work quite well for data that can be handed off to the next process, and where the communications is a small enough part of the problem that serializing it for communications is reasonable, and for cases where the data that needs high-speed communications can be treated as a relocatable chunk of memory. And any combination of those three, of course. The real problem with using processes in python is that there's no way to share complex python objects between processes - you're restricted to ctypes values or arrays of those. For many applications, that's fine. If you need to share a large searchable structure, you're reduced to FORTRAN techniques.
If you have 100 cores even with a small 50MB program, in order to parallelize it you go from 50MB to 5GB. Memory and memory access become a major bottle neck.
That should be fixed in the OS, not by making your problem 2**100 times as hard to analyze. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org