Finding subsets for a robust regression
tkpmep at hotmail.com
tkpmep at hotmail.com
Mon Sep 29 14:59:27 EDT 2008
I have coded a robust (Theil-Sen) regression routine which takes as
inputs two lists of numbers, x and y, and returns a robust estimate of
the slope and intercept of the best robust straight line fit.
In a pre-processing phase, I create two new lists, x1 and y1; x1 has
only the unique values in x, and for each unique value in x1, y1 has
the median of all such values in x. My code follows, and it seems a
bit clumsy - is there a cleaner way to do it? By the way, I'd be more
than happy to share the code for the entire algorithm - just let me
know and I will post it here.
Thanks in advance
Thomas Philips
d = {} #identify unique instances of x and y
for xx,yy in zip(x,y):
if xx in d:
d[xx].append(yy)
else:
d[xx] = [yy]
x1 = [] #unique instances of x and y
y1 = [] #median(y) for each unique value of x
for xx,yy in d.iteritems():
x1.append(xx)
l = len(yy)
if l == 1:
y1.append(yy[0])
else:
yy.sort()
y1.append( (yy[l//2-1] + yy[l//2])/2.0 if l % 2 == 0 else
yy[l//2] )
More information about the Python-list
mailing list