[Numpy-discussion] Fastest way to compute summary statistics for a specific axis

Dave Hirschfeld dave.hirschfeld at gmail.com
Tue Mar 17 07:41:37 EDT 2015


Sebastian Berg <sebastian <at> sipsolutions.net> writes:

> 
> On Mo, 2015-03-16 at 15:53 +0000, Dave Hirschfeld wrote:
> > I have a number of large arrays for which I want to compute the mean 
and 
> > standard deviation over a particular axis - e.g. I want to compute 
the 
> > statistics for axis=1 as if the other axes were combined so that in 
the 
> > example below I get two values back
> > 
> > In [1]: a = randn(30, 2, 10000)
> > 
> > For the mean this can be done easily like:
> > 
> > In [2]: a.mean(0).mean(-1)
> > Out[2]: array([ 0.0007, -0.0009])
> > 
> 
> If you have numpy 1.7+ (which I guess by now is basically always the
> case), you can do a.mean((0, 1)). Though it isn't actually faster in
> this example, probably because it has to use buffered iterators and
> things, but I would guess the performance should be much more stable
> depending on memory order, etc. then any other method.
> 
> - Sebastian
> 


Wow, I didn't know you could even do that - that's very cool (and a lot 
cleaner than manually reordering & reshaping)

It seems to be pretty fast for me and reasonably stable wrt memory 
order:

In [199]: %timeit a.mean(0).mean(-1)
     ...: %timeit a.mean(axis=(0,2))
     ...: %timeit a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
     ...: %timeit a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
     ...: 
1000 loops, best of 3: 1.52 ms per loop
1000 loops, best of 3: 1.5 ms per loop
100 loops, best of 3: 4.8 ms per loop
100 loops, best of 3: 14.6 ms per loop

In [200]: a = a.copy('F')

In [201]: %timeit a.mean(0).mean(-1)
     ...: %timeit a.mean(axis=(0,2))
     ...: %timeit a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
     ...: %timeit a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
100 loops, best of 3: 2.02 ms per loop
100 loops, best of 3: 3.29 ms per loop
100 loops, best of 3: 7.18 ms per loop
100 loops, best of 3: 15.9 ms per loop


Thanks,
Dave





More information about the NumPy-Discussion mailing list