
Hi, I am a newbie to Numeric/numarray programming and would appreciate your help in improving the code below (which I'm sure is quite ugly to an experienced numarray programmer). An analysis we are carrying out requires the following: 1. evaluate the mean of a set of data 2. eliminate the data point farthest from the mean 3. repeat steps 1 and 2 until a certain specified fraction of points has been eliminated. Since this analysis will have to be performed (probably repeatedly) on approximately ten thousand data sets, each of which contains 100-500 points, I would like the code to be as fast as possible. Thanks for your help. -g ==== from numarray import add, array, asarray, absolute, argsort, floor, take, size def mean(m,axis=0): m = asarray(m) return add.reduce(m,axis)/float(m.shape[axis]) def eliminate_outliers(dat,frac): num_to_eliminate = int(floor(size(dat,0)*frac)) for i in range(num_to_eliminate): ind = argsort(absolute(dat-mean(dat)),0) sdat = take(dat,ind,0)[:,0] dat = sdat[:-1] return dat #-------------------------------------------------------------------- if __name__ == "__main__": from MLab import rand sz = 100 nn = rand(sz,1) nn[:10] = 20*rand(10,1) nn[sz-10:] = -20*rand(10,1) print eliminate_outliers(nn,0.10)