Extending numpy statistics functions (like mean)
Hi list.
For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch?
I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N  1 dimensional array.
Regards, Sergio
On Mon, Apr 11, 2011 at 2:36 PM, Sergio Pascual sergio.pasra@gmail.com wrote:
Hi list.
For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch?
I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N  1 dimensional array.
Here's a slow, brute force method:
a = np.arange(9).reshape(3,3) a
array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
idx = a > 6 b = a. copy() b[idx] = 0 b
array([[0, 1, 2], [3, 4, 5], [6, 0, 0]])
1.0 * b.sum(axis=0) / (~idx).sum(axis=0)
array([ 3. , 2.5, 3.5])
What I have is some C++ functions that implement statistic functions. What I need is some kind of ufunc where I can "plug" my functions. But I doesn't seem to exist an ufunc that operates on a Nd array and turns it into a number.
2011/4/12 Keith Goodman kwgoodman@gmail.com:
On Mon, Apr 11, 2011 at 2:36 PM, Sergio Pascual sergio.pasra@gmail.com wrote:
Hi list.
For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch?
I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N  1 dimensional array.
Here's a slow, brute force method:
a = np.arange(9).reshape(3,3) a
array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
idx = a > 6 b = a. copy() b[idx] = 0 b
array([[0, 1, 2], [3, 4, 5], [6, 0, 0]])
1.0 * b.sum(axis=0) / (~idx).sum(axis=0)
array([ 3. , 2.5, 3.5]) _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On 04/11/2011 05:03 PM, Keith Goodman wrote:
On Mon, Apr 11, 2011 at 2:36 PM, Sergio Pascualsergio.pasra@gmail.com wrote:
Hi list.
For mi application, I would like to implement some new statistics functions over numpy arrays, such as truncated mean. Ideally this new function should have the same arguments than numpy.mean: axis, dtype and out. Is there a way of writing this function that doesn't imply writing it in C from scratch?
I have read the documentation, but as far a I see ufuncs convert a N dimensional array into another and generalized ufuncs require fixed dimensions. numpy mean converts a N dimensional array either in a number or a N  1 dimensional array.
Here's a slow, brute force method:
a = np.arange(9).reshape(3,3) a
array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
idx = a> 6 b = a. copy() b[idx] = 0 b
array([[0, 1, 2], [3, 4, 5], [6, 0, 0]])
1.0 * b.sum(axis=0) / (~idx).sum(axis=0)
array([ 3. , 2.5, 3.5])
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
The truncated functions are easily handled by masked arrays and somewhat harder by using indexing (as seen below). There is limited functionality in scipy.stats as well. So first check scipy.stats to see if the functions you need are there. Otherwise please post a list of possible functions to the scipydev list because that is the most likely home.
import numpy as np from numpy import ma y = np.arange(35).reshape(5,7) b=y>20 z=ma.masked_where(y <= 20, y) z.mean()
27.5
z.mean(axis=0)
masked_array(data = [24.5 25.5 26.5 27.5 28.5 29.5 30.5], mask = [False False False False False False False], fill_value = 1e+20)
z.mean(axis=1)
masked_array(data = [   24.0 31.0], mask = [ True True True False False], fill_value = 1e+20)
y[b].mean()
27.5
y[b[:,5]].mean(axis=0)
array([ 24.5, 25.5, 26.5, 27.5, 28.5, 29.5, 30.5])
y[b[:,5]].mean(axis=1)
array([ 24., 31.])
Bruce
participants (4)

Bruce Southey

Keith Goodman

Sergio Pascual

Sergio Pascual