Extending numpy statistics functions (like mean)
Hi list. For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch? I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N  1 dimensional array. Regards, Sergio
On Mon, Apr 11, 2011 at 2:36 PM, Sergio Pascual <sergio.pasra@gmail.com> wrote:
Hi list.
For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch?
I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N  1 dimensional array.
Here's a slow, brute force method:
a = np.arange(9).reshape(3,3) a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) idx = a > 6 b = a. copy() b[idx] = 0 b array([[0, 1, 2], [3, 4, 5], [6, 0, 0]]) 1.0 * b.sum(axis=0) / (~idx).sum(axis=0) array([ 3. , 2.5, 3.5])
What I have is some C++ functions that implement statistic functions. What I need is some kind of ufunc where I can "plug" my functions. But I doesn't seem to exist an ufunc that operates on a Nd array and turns it into a number. 2011/4/12 Keith Goodman <kwgoodman@gmail.com>:
On Mon, Apr 11, 2011 at 2:36 PM, Sergio Pascual <sergio.pasra@gmail.com> wrote:
Hi list.
For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch?
I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N  1 dimensional array.
Here's a slow, brute force method:
a = np.arange(9).reshape(3,3) a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) idx = a > 6 b = a. copy() b[idx] = 0 b array([[0, 1, 2], [3, 4, 5], [6, 0, 0]]) 1.0 * b.sum(axis=0) / (~idx).sum(axis=0) array([ 3. , 2.5, 3.5])
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
 Sergio Pascual http://guaix.fis.ucm.es/~spr gpg fingerprint: 5203 B42D 86A0 5649 410A F4AC A35F D465 F263 BCCC Departamento de Astrofísica  Universidad Complutense de Madrid (Spain)
On 04/11/2011 05:03 PM, Keith Goodman wrote:
On Mon, Apr 11, 2011 at 2:36 PM, Sergio Pascual<sergio.pasra@gmail.com> wrote:
Hi list.
For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch?
I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N  1 dimensional array. Here's a slow, brute force method:
a = np.arange(9).reshape(3,3) a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) idx = a> 6 b = a. copy() b[idx] = 0 b array([[0, 1, 2], [3, 4, 5], [6, 0, 0]]) 1.0 * b.sum(axis=0) / (~idx).sum(axis=0) array([ 3. , 2.5, 3.5])
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion The truncated functions are easily handled by masked arrays and somewhat harder by using indexing (as seen below). There is limited functionality in scipy.stats as well. So first check scipy.stats to see if the functions you need are there. Otherwise please post a list of possible functions to the scipydev list because that is the most likely home.
import numpy as np from numpy import ma y = np.arange(35).reshape(5,7) b=y>20 z=ma.masked_where(y <= 20, y) z.mean() 27.5 z.mean(axis=0) masked_array(data = [24.5 25.5 26.5 27.5 28.5 29.5 30.5], mask = [False False False False False False False], fill_value = 1e+20)
z.mean(axis=1) masked_array(data = [   24.0 31.0], mask = [ True True True False False], fill_value = 1e+20)
y[b].mean() 27.5 y[b[:,5]].mean(axis=0) array([ 24.5, 25.5, 26.5, 27.5, 28.5, 29.5, 30.5]) y[b[:,5]].mean(axis=1) array([ 24., 31.])
Bruce
participants (4)

Bruce Southey

Keith Goodman

Sergio Pascual

Sergio Pascual