[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Pauli Virtanen pav at iki.fi
Mon Jul 13 04:00:41 EDT 2009


Wed, 08 Jul 2009 22:16:22 +0000, Pauli Virtanen kirjoitti:
[clip]
> On an older CPU (slower, smaller cache), the situation is slightly
> different:
> 
>     http://www.iki.fi/pav/tmp/athlon.png
>     http://www.iki.fi/pav/tmp/athlon.txt
> 
> On average, it's still an improvement in many cases.  However, now there
> are more regressions. The significant ones (factor of 1/2) are N-D
> arrays where the reduction runs over an axis with a small number of
> elements.

Part of this seemed (thanks, Valgrind!) to be because of L2 cache misses, 
which came from forgetting to evaluate also the first reduction iteration 
in blocks. Fixed -- the regressions are now less severe (most are ~0.8), 
although for this machine there are still some...

-- 
Pauli Virtanen




More information about the NumPy-Discussion mailing list