[Numpy-discussion] Extending numpy statistics functions (like mean)

Tue Apr 12 09:35:04 EDT 2011

On 04/11/2011 05:03 PM, Keith Goodman wrote:
> On Mon, Apr 11, 2011 at 2:36 PM, Sergio Pascual<sergio.pasra at gmail.com>  wrote:
>> Hi list.
>>
>> For mi application, I would like to implement some new statistics
>> functions over numpy arrays, such as truncated mean. Ideally this new
>> function should have the same arguments
>> than numpy.mean: axis, dtype and out. Is there a way of writing this
>> function that doesn't imply writing it in C from scratch?
>>
>> I have read the documentation, but as far a I see ufuncs convert a N
>> dimensional array into another and generalized ufuncs require fixed
>> dimensions. numpy mean converts a N dimensional array either in a
>> number or a N - 1 dimensional array.
> Here's a slow, brute force method:
>
>>> a = np.arange(9).reshape(3,3)
>>> a
> array([[0, 1, 2],
>         [3, 4, 5],
>         [6, 7, 8]])
>>> idx = a>  6
>>> b = a. copy()
>>> b[idx] = 0
>>> b
> array([[0, 1, 2],
>         [3, 4, 5],
>         [6, 0, 0]])
>>> 1.0 * b.sum(axis=0) / (~idx).sum(axis=0)
>     array([ 3. ,  2.5,  3.5])
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
The truncated functions are easily handled by masked arrays and somewhat 
harder by using indexing (as seen below). There is limited functionality 
in scipy.stats as well. So first check scipy.stats to see if the 
functions you need are there. Otherwise please post a list of possible 
functions to the scipy-dev list because that is the most likely home.


 >>> import numpy as np
 >>> from numpy import ma
 >>> y = np.arange(35).reshape(5,7)
 >>> b=y>20
 >>> z=ma.masked_where(y <= 20, y)
 >>> z.mean()
27.5
 >>> z.mean(axis=0)
masked_array(data = [24.5 25.5 26.5 27.5 28.5 29.5 30.5],
              mask = [False False False False False False False],
        fill_value = 1e+20)

 >>> z.mean(axis=1)
masked_array(data = [-- -- -- 24.0 31.0],
              mask = [ True  True  True False False],
        fill_value = 1e+20)

 >>> y[b].mean()
27.5
 >>> y[b[:,5]].mean(axis=0)
array([ 24.5,  25.5,  26.5,  27.5,  28.5,  29.5,  30.5])
 >>> y[b[:,5]].mean(axis=1)
array([ 24.,  31.])


Bruce