Extending numpy statistics functions (like mean)

Hi list. For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch? I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N - 1 dimensional array. Regards, Sergio

On Mon, Apr 11, 2011 at 2:36 PM, Sergio Pascual <sergio.pasra@gmail.com> wrote:
Hi list.
For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch?
I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N - 1 dimensional array.
Here's a slow, brute force method:
a = np.arange(9).reshape(3,3) a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) idx = a > 6 b = a. copy() b[idx] = 0 b array([[0, 1, 2], [3, 4, 5], [6, 0, 0]]) 1.0 * b.sum(axis=0) / (~idx).sum(axis=0) array([ 3. , 2.5, 3.5])

What I have is some C++ functions that implement statistic functions. What I need is some kind of ufunc where I can "plug" my functions. But I doesn't seem to exist an ufunc that operates on a N-d array and turns it into a number. 2011/4/12 Keith Goodman <kwgoodman@gmail.com>:
On Mon, Apr 11, 2011 at 2:36 PM, Sergio Pascual <sergio.pasra@gmail.com> wrote:
Hi list.
For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch?
I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N - 1 dimensional array.
Here's a slow, brute force method:
a = np.arange(9).reshape(3,3) a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) idx = a > 6 b = a. copy() b[idx] = 0 b array([[0, 1, 2], [3, 4, 5], [6, 0, 0]]) 1.0 * b.sum(axis=0) / (~idx).sum(axis=0) array([ 3. , 2.5, 3.5])
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Sergio Pascual http://guaix.fis.ucm.es/~spr gpg fingerprint: 5203 B42D 86A0 5649 410A F4AC A35F D465 F263 BCCC Departamento de Astrofísica -- Universidad Complutense de Madrid (Spain)

On 04/11/2011 05:03 PM, Keith Goodman wrote:
On Mon, Apr 11, 2011 at 2:36 PM, Sergio Pascual<sergio.pasra@gmail.com> wrote:
Hi list.
For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch?
I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N - 1 dimensional array. Here's a slow, brute force method:
a = np.arange(9).reshape(3,3) a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) idx = a> 6 b = a. copy() b[idx] = 0 b array([[0, 1, 2], [3, 4, 5], [6, 0, 0]]) 1.0 * b.sum(axis=0) / (~idx).sum(axis=0) array([ 3. , 2.5, 3.5])
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion The truncated functions are easily handled by masked arrays and somewhat harder by using indexing (as seen below). There is limited functionality in scipy.stats as well. So first check scipy.stats to see if the functions you need are there. Otherwise please post a list of possible functions to the scipy-dev list because that is the most likely home.
import numpy as np from numpy import ma y = np.arange(35).reshape(5,7) b=y>20 z=ma.masked_where(y <= 20, y) z.mean() 27.5 z.mean(axis=0) masked_array(data = [24.5 25.5 26.5 27.5 28.5 29.5 30.5], mask = [False False False False False False False], fill_value = 1e+20)
z.mean(axis=1) masked_array(data = [-- -- -- 24.0 31.0], mask = [ True True True False False], fill_value = 1e+20)
y[b].mean() 27.5 y[b[:,5]].mean(axis=0) array([ 24.5, 25.5, 26.5, 27.5, 28.5, 29.5, 30.5]) y[b[:,5]].mean(axis=1) array([ 24., 31.])
Bruce
participants (4)
-
Bruce Southey
-
Keith Goodman
-
Sergio Pascual
-
Sergio Pascual