[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)
David Warde-Farley
dwf at cs.toronto.edu
Thu Jul 9 04:48:52 EDT 2009
On 8-Jul-09, at 6:16 PM, Pauli Virtanen wrote:
> Just to tickle some interest, a "pathological" case before
> optimization:
>
> In [1]: import numpy as np
> In [2]: x = np.zeros((80000, 256))
> In [3]: %timeit x.sum(axis=0)
> 10 loops, best of 3: 850 ms per loop
>
> After optimization:
>
> In [1]: import numpy as np
> In [2]: x = np.zeros((80000, 256))
> In [3]: %timeit x.sum(axis=0)
> 10 loops, best of 3: 78.5 ms per loop
Not knowing a terrible lot about cache optimization, I have nothing to
contribute but encouragement. :) Pauli, this is fantastic work!
Just curious about regressions: have you tested on any non-x86
hardware? Being a frequent user of an older ppc machine I worry about
such things (and plan to give your benchmark a try tomorrow on both
ppc and ppc64 OS X).
Cheers,
David
More information about the NumPy-Discussion
mailing list