[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)
Pauli Virtanen
pav at iki.fi
Mon Jul 13 04:00:41 EDT 2009
Wed, 08 Jul 2009 22:16:22 +0000, Pauli Virtanen kirjoitti:
[clip]
> On an older CPU (slower, smaller cache), the situation is slightly
> different:
>
> http://www.iki.fi/pav/tmp/athlon.png
> http://www.iki.fi/pav/tmp/athlon.txt
>
> On average, it's still an improvement in many cases. However, now there
> are more regressions. The significant ones (factor of 1/2) are N-D
> arrays where the reduction runs over an axis with a small number of
> elements.
Part of this seemed (thanks, Valgrind!) to be because of L2 cache misses,
which came from forgetting to evaluate also the first reduction iteration
in blocks. Fixed -- the regressions are now less severe (most are ~0.8),
although for this machine there are still some...
--
Pauli Virtanen
More information about the NumPy-Discussion
mailing list