# [Numpy-discussion] Fastest way to compute summary statistics for a specific axis

Dave Hirschfeld dave.hirschfeld at gmail.com
Tue Mar 17 07:41:37 EDT 2015

```Sebastian Berg <sebastian <at> sipsolutions.net> writes:

>
> On Mo, 2015-03-16 at 15:53 +0000, Dave Hirschfeld wrote:
> > I have a number of large arrays for which I want to compute the mean
and
> > standard deviation over a particular axis - e.g. I want to compute
the
> > statistics for axis=1 as if the other axes were combined so that in
the
> > example below I get two values back
> >
> > In [1]: a = randn(30, 2, 10000)
> >
> > For the mean this can be done easily like:
> >
> > In [2]: a.mean(0).mean(-1)
> > Out[2]: array([ 0.0007, -0.0009])
> >
>
> If you have numpy 1.7+ (which I guess by now is basically always the
> case), you can do a.mean((0, 1)). Though it isn't actually faster in
> this example, probably because it has to use buffered iterators and
> things, but I would guess the performance should be much more stable
> depending on memory order, etc. then any other method.
>
> - Sebastian
>

Wow, I didn't know you could even do that - that's very cool (and a lot
cleaner than manually reordering & reshaping)

It seems to be pretty fast for me and reasonably stable wrt memory
order:

In [199]: %timeit a.mean(0).mean(-1)
...: %timeit a.mean(axis=(0,2))
...: %timeit a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
...: %timeit a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
...:
1000 loops, best of 3: 1.52 ms per loop
1000 loops, best of 3: 1.5 ms per loop
100 loops, best of 3: 4.8 ms per loop
100 loops, best of 3: 14.6 ms per loop

In [200]: a = a.copy('F')

In [201]: %timeit a.mean(0).mean(-1)
...: %timeit a.mean(axis=(0,2))
...: %timeit a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
...: %timeit a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
100 loops, best of 3: 2.02 ms per loop
100 loops, best of 3: 3.29 ms per loop
100 loops, best of 3: 7.18 ms per loop
100 loops, best of 3: 15.9 ms per loop

Thanks,
Dave

```