Fastest way to compute summary statistics for a specific axis
I have a number of large arrays for which I want to compute the mean and standard deviation over a particular axis  e.g. I want to compute the statistics for axis=1 as if the other axes were combined so that in the example below I get two values back In [1]: a = randn(30, 2, 10000) For the mean this can be done easily like: In [2]: a.mean(0).mean(1) Out[2]: array([ 0.0007, 0.0009]) ...but this won't work for the std. Using some transformations we can come up with something which will work for either: In [3]: a.transpose(2,0,1).reshape(1, 2).mean(axis=0) Out[3]: array([ 0.0007, 0.0009]) In [4]: a.transpose(1,0,2).reshape(2, 1).mean(axis=1) Out[4]: array([ 0.0007, 0.0009]) If we look at the performance of these equivalent methods: In [5]: %timeit a.transpose(2,0,1).reshape(1, 2).mean(axis=0) 100 loops, best of 3: 14.5 ms per loop In [6]: %timeit a.transpose(1,0,2).reshape(2, 1).mean(axis=1) 100 loops, best of 3: 5.05 ms per loop we can see that the latter version is a clear winner. Investigating further, both methods appear to copy the data so the performance is likely down to better cache utilisation. In [7]: np.may_share_memory(a, a.transpose(2,0,1).reshape(1, 2)) Out[7]: False In [8]: np.may_share_memory(a, a.transpose(1,0,2).reshape(2, 1)) Out[8]: False Both methods are however significantly slower than the initial attempt: In [9]: %timeit a.mean(0).mean(1) 1000 loops, best of 3: 1.2 ms per loop Perhaps because it allocates a smaller temporary? For those who like a challenge: is there a faster way to achieve what I'm after? Cheers, Dave
On 16 March 2015 at 15:53, Dave Hirschfeld <dave.hirschfeld@gmail.com> wrote:
I have a number of large arrays for which I want to compute the mean and standard deviation over a particular axis  e.g. I want to compute the statistics for axis=1 as if the other axes were combined so that in the example below I get two values back
In [1]: a = randn(30, 2, 10000)
...
Both methods are however significantly slower than the initial attempt:
In [9]: %timeit a.mean(0).mean(1) 1000 loops, best of 3: 1.2 ms per loop
Perhaps because it allocates a smaller temporary?
For those who like a challenge: is there a faster way to achieve what I'm after?
You'll probably find it faster if you swap the means around to make an even smaller temporary: a.mean(1).mean(0) Oscar
On Mon, Mar 16, 2015 at 11:53 AM, Dave Hirschfeld <dave.hirschfeld@gmail.com
wrote:
I have a number of large arrays for which I want to compute the mean and standard deviation over a particular axis  e.g. I want to compute the statistics for axis=1 as if the other axes were combined so that in the example below I get two values back
In [1]: a = randn(30, 2, 10000)
For the mean this can be done easily like:
In [2]: a.mean(0).mean(1) Out[2]: array([ 0.0007, 0.0009])
...but this won't work for the std. Using some transformations we can come up with something which will work for either:
In [3]: a.transpose(2,0,1).reshape(1, 2).mean(axis=0) Out[3]: array([ 0.0007, 0.0009])
In [4]: a.transpose(1,0,2).reshape(2, 1).mean(axis=1) Out[4]: array([ 0.0007, 0.0009])
Specify all of the axes you want to reduce over as a tuple. In [1]: import numpy as np In [2]: a = np.random.randn(30, 2, 10000) In [3]: a.mean(axis=(0,1)) Out[3]: array([0.00224589, 0.00230759]) In [4]: a.std(axis=(0,1)) Out[4]: array([ 1.00062771, 1.0001258 ]) Eric
On Mo, 20150316 at 15:53 +0000, Dave Hirschfeld wrote:
I have a number of large arrays for which I want to compute the mean and standard deviation over a particular axis  e.g. I want to compute the statistics for axis=1 as if the other axes were combined so that in the example below I get two values back
In [1]: a = randn(30, 2, 10000)
For the mean this can be done easily like:
In [2]: a.mean(0).mean(1) Out[2]: array([ 0.0007, 0.0009])
If you have numpy 1.7+ (which I guess by now is basically always the case), you can do a.mean((0, 1)). Though it isn't actually faster in this example, probably because it has to use buffered iterators and things, but I would guess the performance should be much more stable depending on memory order, etc. then any other method.  Sebastian
...but this won't work for the std. Using some transformations we can come up with something which will work for either:
In [3]: a.transpose(2,0,1).reshape(1, 2).mean(axis=0) Out[3]: array([ 0.0007, 0.0009])
In [4]: a.transpose(1,0,2).reshape(2, 1).mean(axis=1) Out[4]: array([ 0.0007, 0.0009])
If we look at the performance of these equivalent methods:
In [5]: %timeit a.transpose(2,0,1).reshape(1, 2).mean(axis=0) 100 loops, best of 3: 14.5 ms per loop
In [6]: %timeit a.transpose(1,0,2).reshape(2, 1).mean(axis=1) 100 loops, best of 3: 5.05 ms per loop
we can see that the latter version is a clear winner. Investigating further, both methods appear to copy the data so the performance is likely down to better cache utilisation.
In [7]: np.may_share_memory(a, a.transpose(2,0,1).reshape(1, 2)) Out[7]: False
In [8]: np.may_share_memory(a, a.transpose(1,0,2).reshape(2, 1)) Out[8]: False
Both methods are however significantly slower than the initial attempt:
In [9]: %timeit a.mean(0).mean(1) 1000 loops, best of 3: 1.2 ms per loop
Perhaps because it allocates a smaller temporary?
For those who like a challenge: is there a faster way to achieve what I'm after?
Cheers, Dave
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
Sebastian Berg <sebastian <at> sipsolutions.net> writes:
On Mo, 20150316 at 15:53 +0000, Dave Hirschfeld wrote:
I have a number of large arrays for which I want to compute the mean
standard deviation over a particular axis  e.g. I want to compute
statistics for axis=1 as if the other axes were combined so that in
and the the
example below I get two values back
In [1]: a = randn(30, 2, 10000)
For the mean this can be done easily like:
In [2]: a.mean(0).mean(1) Out[2]: array([ 0.0007, 0.0009])
If you have numpy 1.7+ (which I guess by now is basically always the case), you can do a.mean((0, 1)). Though it isn't actually faster in this example, probably because it has to use buffered iterators and things, but I would guess the performance should be much more stable depending on memory order, etc. then any other method.
 Sebastian
Wow, I didn't know you could even do that  that's very cool (and a lot cleaner than manually reordering & reshaping) It seems to be pretty fast for me and reasonably stable wrt memory order: In [199]: %timeit a.mean(0).mean(1) ...: %timeit a.mean(axis=(0,2)) ...: %timeit a.transpose(1,0,2).reshape(2, 1).mean(axis=1) ...: %timeit a.transpose(2,0,1).reshape(1, 2).mean(axis=0) ...: 1000 loops, best of 3: 1.52 ms per loop 1000 loops, best of 3: 1.5 ms per loop 100 loops, best of 3: 4.8 ms per loop 100 loops, best of 3: 14.6 ms per loop In [200]: a = a.copy('F') In [201]: %timeit a.mean(0).mean(1) ...: %timeit a.mean(axis=(0,2)) ...: %timeit a.transpose(1,0,2).reshape(2, 1).mean(axis=1) ...: %timeit a.transpose(2,0,1).reshape(1, 2).mean(axis=0) 100 loops, best of 3: 2.02 ms per loop 100 loops, best of 3: 3.29 ms per loop 100 loops, best of 3: 7.18 ms per loop 100 loops, best of 3: 15.9 ms per loop Thanks, Dave
participants (4)

Dave Hirschfeld

Eric Moore

Oscar Benjamin

Sebastian Berg