[Tutor] multiprocessing question

Cameron Simpson cs at zip.com.au
Fri Nov 28 10:24:17 CET 2014


On 27Nov2014 17:55, Dave Angel <davea at davea.name> wrote:
>On 11/27/2014 04:01 PM, Albert-Jan Roskam wrote:
>>I made a comparison between multiprocessing and threading.  In the code below (it's also here: http://pastebin.com/BmbgHtVL, multiprocessing is more than 100 (yes: one hundred) times slower than threading! That is I-must-be-doing-something-wrong-ishly slow. Any idea whether I am doing something wrong? I can't believe the difference is so big.
>
>The bulk of the time is spent marshalling the data to the dictionary 
>self.lookup.  You can speed it up some by using a list there (it also 
>makes the code much simpler).  But the real trick is to communicate 
>less often between the processes.
[...]

Exactly so. You're being bitten by latency and of course the sheer cost of 
copying stuff around. With a thread the latency and copying is effectively 
zero: the data structure you're using in one thread is the same data structure 
in use by another. With multiprocessing they're completely separate (distinct 
memory spaces); data must be passed from one to the other, and there's a cost 
for that.

By treating multiprocessing like threading in terms of the shared data, you're 
making lots of little updates.

See sig quote.

Cheers,
Cameron Simpson <cs at zip.com.au>

The Eight Fallacies of Distributed Computing - Peter Deutsch
1. 	The network is reliable
2. 	Latency is zero
3. 	Bandwidth is infinite
4. 	The network is secure
5. 	Topology doesn't change
6. 	There is one administrator
7. 	Transport cost is zero
8. 	The network is homogeneous


More information about the Tutor mailing list