Re: [Numpy-discussion] numpy ufuncs and COREPY - any info?

May 22, 2009


      Francesc Alted wrote:
...
A Friday 22 May 2009 11:42:56 Gregor Thalhammer escrigué:
...
dmitrey schrieb:
3) Improving performance by using multi cores is much more difficult.
Only for sufficiently large (>1e5) arrays a significant speedup is
possible. Where a speed gain is possible, the MKL uses several cores.
Some experimentation showed that adding a few OpenMP constructs you
could get a similar speedup with numpy.
4) numpy.dot uses optimized implementations.
Good points Gregor.  However, I wouldn't say that improving performance by 
using multi cores is *that* difficult, but rather that multi cores can only be 
used efficiently *whenever* the memory bandwith is not a limitation.  An 
example of this is the computation of transcendental functions, where, even 
using vectorized implementations, the computation speed is still CPU-bounded 
in many cases.  And you have experimented yourself very good speed-ups for 
these cases with your implementation of numexpr/MKL :)
Using multiple cores is pretty easy for element-wise ufuncs; no 
communication needs to occur and the work partitioning is trivial.  And 
actually I've found with some initial testing that multiple cores does 
still help when you are memory bound.  I don't fully understand why yet, 
though I have some ideas.  One reason is multiple memory controllers due 
to multiple sockets (ie opteron).  Another is that each thread is 
pulling memory from a different bank, utilizing more bandwidth than a 
single sequential thread could.  However if that's the case, we could 
possibly come up with code for a single thread that achieves (nearly) 
the same additional throughput..

Andrew

Re: [Numpy-discussion] numpy ufuncs and COREPY - any info?

Andrew Friedley