help in improving data analysis code
![](https://secure.gravatar.com/avatar/e3a8a7f575337c88d5eedac12cb861bf.jpg?s=120&d=mm&r=g)
Hi, I am a newbie to Numeric/numarray programming and would appreciate your help in improving the code below (which I'm sure is quite ugly to an experienced numarray programmer). An analysis we are carrying out requires the following: 1. evaluate the mean of a set of data 2. eliminate the data point farthest from the mean 3. repeat steps 1 and 2 until a certain specified fraction of points has been eliminated. Since this analysis will have to be performed (probably repeatedly) on approximately ten thousand data sets, each of which contains 100-500 points, I would like the code to be as fast as possible. Thanks for your help. -g ==== from numarray import add, array, asarray, absolute, argsort, floor, take, size def mean(m,axis=0): m = asarray(m) return add.reduce(m,axis)/float(m.shape[axis]) def eliminate_outliers(dat,frac): num_to_eliminate = int(floor(size(dat,0)*frac)) for i in range(num_to_eliminate): ind = argsort(absolute(dat-mean(dat)),0) sdat = take(dat,ind,0)[:,0] dat = sdat[:-1] return dat #-------------------------------------------------------------------- if __name__ == "__main__": from MLab import rand sz = 100 nn = rand(sz,1) nn[:10] = 20*rand(10,1) nn[sz-10:] = -20*rand(10,1) print eliminate_outliers(nn,0.10)
![](https://secure.gravatar.com/avatar/5c7407de6b47afcd3b3e2164ff5bcd45.jpg?s=120&d=mm&r=g)
A Divendres 25 Novembre 2005 15:24, gf va escriure:
For sz=100, the next line of code is 10x faster on my machine (more if sz is bigger): print nn[argsort(abs(nn_c-nn_c.mean()),0)][:-int(sz*0.10),0] I haven't checked it very carefully, so you should double check it. BTW, you will need to use the numarray MLab interface: from numarray.mlab import rand Cheers, --
![](https://secure.gravatar.com/avatar/5c7407de6b47afcd3b3e2164ff5bcd45.jpg?s=120&d=mm&r=g)
A Divendres 25 Novembre 2005 15:24, gf va escriure:
For sz=100, the next line of code is 10x faster on my machine (more if sz is bigger): print nn[argsort(abs(nn_c-nn_c.mean()),0)][:-int(sz*0.10),0] I haven't checked it very carefully, so you should double check it. BTW, you will need to use the numarray MLab interface: from numarray.mlab import rand Cheers, --
participants (2)
-
Francesc Altet
-
gf