[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Wed Jun 15 20:09:46 EDT 2011

Den 15.06.2011 23:22, skrev Christopher Barker:
>
> It would also would be great if someone that actually understands this
> stuff could look at his code and explain why the slowdown occurs (hint,
> hint!)
>

Not sure I qualify, but I think I notice several potential problems in 
the OP's multiprocessing/NumPy code:

"innerProductList = pool.map(myutil.numpy_inner_product, arrayList)"

1.  Here we potentially have a case of false sharing and/or mutex 
contention, as the work is too fine grained.  pool.map does not do any 
load balancing. If pool.map is to scale nicely, each work item must take 
a substantial amount of time. I suspect this is the main issue.

2. There is also the question of when the process pool is spawned. 
Though I haven't checked, I suspect it happens prior to calling 
pool.map. But if it does not, this is a factor as well, particularly on 
Windows (less so on Linux and Apple).

3.  "arrayList" is serialised by pickling, which has a significan 
overhead.  It's not shared memory either, as the OP's code implies, but 
the main thing is the slowness of cPickle.

"IPs = N.array(innerProductList)"

4.  numpy.array is a very slow function. The benchmark should preferably 
not include this overhead.

Sturla