[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Thu Jun 16 17:31:16 EDT 2011

On 06/16/2011 02:05 PM, Brandt Belson wrote:
> Hi all,
> Thanks for the replies. As mentioned, I'm parallelizing so that I can 
> take many inner products simultaneously (which I agree is 
> embarrassingly parallel). The library I'm writing asks the user to 
> supply a function that takes two objects and returns their inner 
> product. After all the discussion though it seems this is too 
> simplistic of an approach. Instead, I plan to write this part of the 
> library as if the inner product function supplied by the user uses all 
> available cores (with numpy and/or numexpr built with MKL or LAPACK).
>
> As far as using fortran or C and openMP, this probably isn't worth the 
> time it would take, both for me and the user.
>
> I've tried increasing the array sizes and found the same trends, so 
> the slowdown isn't only because the arrays are too small to see the 
> benefit of multiprocessing. I wrote the code to be easy for anyone to 
> experiment with, so feel free to play around with what is included in 
> the profiling, the sizes of arrays, functions used, etc.
>
> I also tried using handythread.foreach with arraySize = (3000,1000), 
> and found the following:
> No shared memory, numpy array multiplication took 1.57585811615 seconds
> Shared memory, numpy array multiplication took 1.25499510765 seconds
> This is definitely an improvement from multiprocessing, but without 
> knowing any better, I was hoping to see a roughly 8x speedup on my 
> 8-core workstation.
>
> Based on what Chris sent, it seems there is some large overhead caused 
> by multiprocessing pickling numpy arrays. To test what Robin mentioned
>
> > If you are on Linux or Mac then fork works nicely so you have read
> > only shared memory you just have to put it in a module before the fork
> > (so before pool = Pool() ) and then all the subprocesses can access it
> > without any pickling required. ie
> > myutil.data = listofdata
> > p = multiprocessing.Pool(8)
> > def mymapfunc(i):
> >   return mydatafunc(myutil.data[i])
> >
> > p.map(mymapfunc, range(len(myutil.data)))
>
> I tried creating the arrayList in the myutil module and using 
> multiprocessing to find the inner products of myutil.arrayList, 
> however this was still slower than not using multiprocessing, so I 
> believe there is still some large overhead. Here are the results:
> No shared memory, numpy array multiplication took 1.55906510353 seconds
> Shared memory, numpy array multiplication took 9.82426381111 seconds
> Shared memory, myutil.arrayList numpy array multiplication took 
> 8.77094507217 seconds
> I'm attaching this code.
>
> I'm going to work around this numpy/multiprocessing behavior with 
> numpy/numexpr built with MKL or LAPACK. It would be good to know 
> exactly what's causing this though. It would be nice if there was a 
> way to get the ideal speedup via multiprocessing, regardless of the 
> internal workings of the single-threaded inner product function, as 
> this was the behavior I expected. I imagine other people might come 
> across similar situations, but again I'm going to try to get around 
> this by letting MKL or LAPACK make use of all available cores.
>
> Thanks again,
> Brandt
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
I think this is not being benchmarked correctly because there should be 
a noticeable different when different number of threads are selected.

But really you should read these sources:
http://www.scipy.org/ParallelProgramming
http://stackoverflow.com/questions/5260068/multithreaded-blas-in-python-numpy

Also numpy has extra things going on like checks and copies that 
probably make using np.inner() slower. Thus, your 'numpy_inner_product' 
is probably as efficient as you can get without extreme measures like 
cython.

Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110616/fd00f17b/attachment.html>