# [Numpy-discussion] help in improving data analysis code

gf gyromagnetic at gmail.com
Fri Nov 25 06:25:03 EST 2005

```Hi,
I am a newbie to Numeric/numarray programming and would appreciate
your help in improving the code below (which I'm sure is quite ugly to
an experienced numarray programmer).
An analysis we are carrying out requires the following:
1. evaluate the mean of a set of data
2. eliminate the data point farthest from the mean
3. repeat steps 1 and 2 until a certain specified fraction of points
has been eliminated.

Since this analysis will have to be performed (probably repeatedly) on
approximately ten thousand data sets, each of which contains 100-500
points, I would like the code to be as fast as possible.

-g

====

from numarray import add, array, asarray, absolute, argsort, floor, take, size

def mean(m,axis=0):
m = asarray(m)

def eliminate_outliers(dat,frac):
num_to_eliminate = int(floor(size(dat,0)*frac))
for i in range(num_to_eliminate):
ind = argsort(absolute(dat-mean(dat)),0)
sdat = take(dat,ind,0)[:,0]
dat = sdat[:-1]
return dat

#--------------------------------------------------------------------

if __name__ == "__main__":
from MLab import rand
sz = 100
nn = rand(sz,1)
nn[:10] = 20*rand(10,1)
nn[sz-10:] = -20*rand(10,1)
print eliminate_outliers(nn,0.10)

```