[Numpy-discussion] odd performance of sum?

Sturla Molden sturla at molden.no
Sat Feb 12 10:38:11 EST 2011

Den 10.02.2011 16:29, skrev eat:
> One would expect sum to outperform dot with a clear marginal. Does 
> there exixts any 'tricks' to increase the performance of sum?

I see that others have ansvered already. The ufunc np.sum is not going 
going to beat np.dot. You are racing the heavy machinery of NumPy (array 
iterators, type chekcs, bound checks, etc.) against level-3 BLAS routine 
DGEMM, the most heavily optimized numerical kernel ever written. Also 
beware that computation is much cheaper than memory access. Although 
DGEMM does more arithmetics, and even is O(N3) in that respect, it is 
always faster except for very sparse arrays. If you need fast loops, you 
can always write your own Fortran or C, and even insert OpenMP pragmas. 
But don't expect that to beat optimized high-level BLAS kernels by any 
margin. The first chapters of "Numerical Methods in Fortran 90" might be 
worth reading. It deals with several of these issues, including 
dimensional expansion, which is important for writing fast numerical 
code -- but not intuitively obvious. "I expect this to be faster because 
it does less work" is a fundamental misconception in numerical 
computing. Whatever cause less traffic on the memory BUS (the real 
bottleneck) will almost always be faster, regardless of the amount of 
work done by the CPU. A good advice is to use high-level BLAS whenever 
you can. The only exception, as mentioned, is when matrices get very sparse.


More information about the NumPy-Discussion mailing list