[Numpy-discussion] odd performance of sum?
sturla at molden.no
Sat Feb 12 10:38:11 EST 2011
Den 10.02.2011 16:29, skrev eat:
> One would expect sum to outperform dot with a clear marginal. Does
> there exixts any 'tricks' to increase the performance of sum?
I see that others have ansvered already. The ufunc np.sum is not going
going to beat np.dot. You are racing the heavy machinery of NumPy (array
iterators, type chekcs, bound checks, etc.) against level-3 BLAS routine
DGEMM, the most heavily optimized numerical kernel ever written. Also
beware that computation is much cheaper than memory access. Although
DGEMM does more arithmetics, and even is O(N3) in that respect, it is
always faster except for very sparse arrays. If you need fast loops, you
can always write your own Fortran or C, and even insert OpenMP pragmas.
But don't expect that to beat optimized high-level BLAS kernels by any
margin. The first chapters of "Numerical Methods in Fortran 90" might be
worth reading. It deals with several of these issues, including
dimensional expansion, which is important for writing fast numerical
code -- but not intuitively obvious. "I expect this to be faster because
it does less work" is a fundamental misconception in numerical
computing. Whatever cause less traffic on the memory BUS (the real
bottleneck) will almost always be faster, regardless of the amount of
work done by the CPU. A good advice is to use high-level BLAS whenever
you can. The only exception, as mentioned, is when matrices get very sparse.
More information about the NumPy-Discussion