[Numpy-discussion] np.mean and np.std performances

Sun Apr 18 08:16:00 EDT 2010

Hi all,

I noticed some performance problems with np.mean and np.std functions.
Here is the console output in ipython:

# make some test data
>>>: a = np.arange(80*64, dtype=np.float64).reshape(80, 64)
>>>: c = np.tile( a, [10000, 1, 1])

>>>: timeit np.mean(c, axis=0)
1 loops, best of 3: 2.09 s per loop

But using reduce is much faster:

def mean_reduce(c):
    return reduce(lambda som, array: som+array, c) / c.shape[0]

>>>:timeit mean_reduce(c)
1 loops, best of 3: 355 ms per loop

The same applies to np.std():

# slighlty smaller c matrix (too much memory is used)
>>>: c = np.tile( a, [7000, 1, 1])

>>>: timeit np.std(c, axis=0)
1 loops, best of 3: 3.73 s per loop

With the reduce version:

def std_reduce(c):
    c -= mean_reduce(c)
    return np.sqrt( reduce(lambda som, array: som + array**2, c ) /
c.shape[0] )

>>>: timeit std_reduce(c)
1 loops, best of 3: 1.18 s per loop

For the std function also look at the memory usage during the execution of
the function.

The functions i gave here can be easily modified to accept an axis option
and other stuff needed.

Is there any drawback of using them? Why np.mean and np.std are so slow?

I'm sure I'm missing something.

Cheers

Davide
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100418/4bf75a15/attachment.html>