multiprocessing vs. distributed processing

Jesse Noller jnoller at gmail.com
Fri Jan 16 07:15:25 EST 2009


On Fri, Jan 16, 2009 at 12:52 AM, James Mills
<prologic at shortcircuit.net.au> wrote:
> I've noticed over the past few weeks lots of questions
> asked about multi-processing (including myself).
>
> For those of you new to multi-processing, perhaps this
> thread may help you. Some things I want to start off
> with to point out are:
>
> "multiprocessing will not always help you get things done faster."
>
> "be aware of I/O bound applications vs. CPU bound"
>
> "multiple CPUs (cores) can compute multiple concurrent expressions -
> not read 2 files concurrently"
>
> "in some cases, you may be after distributed processing rather than
> multi or parallel processing"
>
> cheers
> James

James is quite correct, and maybe I need to amend the multiprocessing
documentation to reflect this fact.

While distributed programming and parallel programming may cross paths
in a lot of problems/applications, you have to know when to use one
versus the other. Multiprocessing only provides some basic primitives
to help you get started with distributed programming, it is not it's
primary focus, nor is it a complete solution for distributed
applications.

That being said, there is no reason why you could not use it in
conjunction with something like Kamaelia, pyro, $ipc mechanism/etc.

Ultimately, it's a tool in your toolbox, and you have to judge and
experiment to see which tool is best applied to your problem. In my
own work/code, I use both processes *and* threads - one works better
than the other depending on the problem.

For example, a web testing tool. This is something that needs to
generate hundreds of thousands of HTTP requests - not a problem you
want to use multiprocessing for given that A> It's primarily I/O bound
and B> You can generate that many threads on a single machine.
However, if I wanted to say, generate hundreds of threads across
multiple machines, I would (and do) use multiprocessing + paramiko to
construct a grid of machines and coordinate work.

That all being said: multiprocessing isn't set in stone - there's room
for improvement in the docs, tests and code, and all patches are
welcome.

-jesse



More information about the Python-list mailing list