help in improving data analysis code

Hi, I am a newbie to Numeric/numarray programming and would appreciate your help in improving the code below (which I'm sure is quite ugly to an experienced numarray programmer). An analysis we are carrying out requires the following: 1. evaluate the mean of a set of data 2. eliminate the data point farthest from the mean 3. repeat steps 1 and 2 until a certain specified fraction of points has been eliminated. Since this analysis will have to be performed (probably repeatedly) on approximately ten thousand data sets, each of which contains 100-500 points, I would like the code to be as fast as possible. Thanks for your help. -g ==== from numarray import add, array, asarray, absolute, argsort, floor, take, size def mean(m,axis=0): m = asarray(m) return add.reduce(m,axis)/float(m.shape[axis]) def eliminate_outliers(dat,frac): num_to_eliminate = int(floor(size(dat,0)*frac)) for i in range(num_to_eliminate): ind = argsort(absolute(dat-mean(dat)),0) sdat = take(dat,ind,0)[:,0] dat = sdat[:-1] return dat #-------------------------------------------------------------------- if __name__ == "__main__": from MLab import rand sz = 100 nn = rand(sz,1) nn[:10] = 20*rand(10,1) nn[sz-10:] = -20*rand(10,1) print eliminate_outliers(nn,0.10)

A Divendres 25 Novembre 2005 15:24, gf va escriure:
from numarray import add, array, asarray, absolute, argsort, floor, take, size
def mean(m,axis=0): m = asarray(m) return add.reduce(m,axis)/float(m.shape[axis])
def eliminate_outliers(dat,frac): num_to_eliminate = int(floor(size(dat,0)*frac)) for i in range(num_to_eliminate): ind = argsort(absolute(dat-mean(dat)),0) sdat = take(dat,ind,0)[:,0] dat = sdat[:-1] return dat
#--------------------------------------------------------------------
if __name__ == "__main__": from MLab import rand sz = 100 nn = rand(sz,1) nn[:10] = 20*rand(10,1) nn[sz-10:] = -20*rand(10,1) print eliminate_outliers(nn,0.10)
For sz=100, the next line of code is 10x faster on my machine (more if sz is bigger): print nn[argsort(abs(nn_c-nn_c.mean()),0)][:-int(sz*0.10),0] I haven't checked it very carefully, so you should double check it. BTW, you will need to use the numarray MLab interface: from numarray.mlab import rand Cheers, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

A Divendres 25 Novembre 2005 16:27, Francesc Altet va escriure:
print nn[argsort(abs(nn_c-nn_c.mean()),0)][:-int(sz*0.10),0]
Ups. I have had a confusion. This should work better ;-) print nn[argsort(abs(nn-nn.mean()),0)][:-int(sz*0.10),0] --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"
participants (2)
-
Francesc Altet
-
gf