I have a number of large arrays for which I want to compute the mean and
standard deviation over a particular axis - e.g. I want to compute the
statistics for axis=1 as if the other axes were combined so that in the
example below I get two values back
In [1]: a = randn(30, 2, 10000)
For the mean this can be done easily like:
In [2]: a.mean(0).mean(-1)
Out[2]: array([ 0.0007, -0.0009])
...but this won't work for the std. Using some transformations we can
come up with something which will work for either:
In [3]: a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
Out[3]: array([ 0.0007, -0.0009])
In [4]: a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
Out[4]: array([ 0.0007, -0.0009])
If we look at the performance of these equivalent methods:
In [5]: %timeit a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
100 loops, best of 3: 14.5 ms per loop
In [6]: %timeit a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
100 loops, best of 3: 5.05 ms per loop
we can see that the latter version is a clear winner. Investigating
further, both methods appear to copy the data so the performance is
likely down to better cache utilisation.
In [7]: np.may_share_memory(a, a.transpose(2,0,1).reshape(-1, 2))
Out[7]: False
In [8]: np.may_share_memory(a, a.transpose(1,0,2).reshape(2, -1))
Out[8]: False
Both methods are however significantly slower than the initial attempt:
In [9]: %timeit a.mean(0).mean(-1)
1000 loops, best of 3: 1.2 ms per loop
Perhaps because it allocates a smaller temporary?
For those who like a challenge: is there a faster way to achieve what
I'm after?
Cheers,
Dave