# [Numpy-discussion] Fastest way to compute summary statistics for a specific axis

Sebastian Berg sebastian at sipsolutions.net
Mon Mar 16 12:12:56 EDT 2015

```On Mo, 2015-03-16 at 15:53 +0000, Dave Hirschfeld wrote:
> I have a number of large arrays for which I want to compute the mean and
> standard deviation over a particular axis - e.g. I want to compute the
> statistics for axis=1 as if the other axes were combined so that in the
> example below I get two values back
>
> In [1]: a = randn(30, 2, 10000)
>
> For the mean this can be done easily like:
>
> In [2]: a.mean(0).mean(-1)
> Out[2]: array([ 0.0007, -0.0009])
>

If you have numpy 1.7+ (which I guess by now is basically always the
case), you can do a.mean((0, 1)). Though it isn't actually faster in
this example, probably because it has to use buffered iterators and
things, but I would guess the performance should be much more stable
depending on memory order, etc. then any other method.

- Sebastian

>
> ...but this won't work for the std. Using some transformations we can
> come up with something which will work for either:
>
> In [3]: a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
> Out[3]: array([ 0.0007, -0.0009])
>
> In [4]: a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
> Out[4]: array([ 0.0007, -0.0009])
>
>
> If we look at the performance of these equivalent methods:
>
> In [5]: %timeit a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
> 100 loops, best of 3: 14.5 ms per loop
>
> In [6]: %timeit a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
> 100 loops, best of 3: 5.05 ms per loop
>
>
> we can see that the latter version is a clear winner. Investigating
> further, both methods appear to copy the data so the performance is
> likely down to better cache utilisation.
>
> In [7]: np.may_share_memory(a, a.transpose(2,0,1).reshape(-1, 2))
> Out[7]: False
>
> In [8]: np.may_share_memory(a, a.transpose(1,0,2).reshape(2, -1))
> Out[8]: False
>
>
> Both methods are however significantly slower than the initial attempt:
>
> In [9]: %timeit a.mean(0).mean(-1)
> 1000 loops, best of 3: 1.2 ms per loop
>
> Perhaps because it allocates a smaller temporary?
>
> For those who like a challenge: is there a faster way to achieve what
> I'm after?
>
>
> Cheers,
> Dave
>
>
>
>
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150316/3730b2a2/attachment.sig>
```