[Numpy-discussion] Fastest way to compute summary statistics for a specific axis

Dave Hirschfeld dave.hirschfeld at gmail.com
Mon Mar 16 11:53:08 EDT 2015

I have a number of large arrays for which I want to compute the mean and 
standard deviation over a particular axis - e.g. I want to compute the 
statistics for axis=1 as if the other axes were combined so that in the 
example below I get two values back

In [1]: a = randn(30, 2, 10000)

For the mean this can be done easily like:

In [2]: a.mean(0).mean(-1)
Out[2]: array([ 0.0007, -0.0009])

...but this won't work for the std. Using some transformations we can 
come up with something which will work for either:

In [3]: a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
Out[3]: array([ 0.0007, -0.0009])

In [4]: a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
Out[4]: array([ 0.0007, -0.0009])

If we look at the performance of these equivalent methods:

In [5]: %timeit a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
100 loops, best of 3: 14.5 ms per loop

In [6]: %timeit a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
100 loops, best of 3: 5.05 ms per loop

we can see that the latter version is a clear winner. Investigating 
further, both methods appear to copy the data so the performance is 
likely down to better cache utilisation.

In [7]: np.may_share_memory(a, a.transpose(2,0,1).reshape(-1, 2))
Out[7]: False

In [8]: np.may_share_memory(a, a.transpose(1,0,2).reshape(2, -1))
Out[8]: False

Both methods are however significantly slower than the initial attempt:

In [9]: %timeit a.mean(0).mean(-1)
1000 loops, best of 3: 1.2 ms per loop

Perhaps because it allocates a smaller temporary?

For those who like a challenge: is there a faster way to achieve what 
I'm after?


More information about the NumPy-Discussion mailing list