[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Thu Jul 9 04:48:52 EDT 2009

On 8-Jul-09, at 6:16 PM, Pauli Virtanen wrote:

> Just to tickle some interest, a "pathological" case before  
> optimization:
>
>    In [1]: import numpy as np
>    In [2]: x = np.zeros((80000, 256))
>    In [3]: %timeit x.sum(axis=0)
>    10 loops, best of 3: 850 ms per loop
>
> After optimization:
>
>    In [1]: import numpy as np
>    In [2]: x = np.zeros((80000, 256))
>    In [3]: %timeit x.sum(axis=0)
>    10 loops, best of 3: 78.5 ms per loop

Not knowing a terrible lot about cache optimization, I have nothing to  
contribute but encouragement. :) Pauli, this is fantastic work!

Just curious about regressions: have you tested on any non-x86  
hardware? Being a frequent user of an older ppc machine I worry about  
such things (and plan to give your benchmark a try tomorrow on both  
ppc and ppc64 OS X).

Cheers,
David