# Finding subsets for a robust regression

tkpmep at hotmail.com tkpmep at hotmail.com
Mon Sep 29 20:59:27 CEST 2008

```I have coded a robust (Theil-Sen) regression routine which takes as
inputs two lists of numbers, x and y, and returns a robust estimate of
the slope and intercept of the best robust straight line fit.

In a pre-processing phase, I create two new lists, x1 and y1; x1 has
only the unique values in x, and for each unique value in x1, y1 has
the median of all such values in x. My code follows, and it seems a
bit clumsy - is there a cleaner way to do it? By the way, I'd be more
than happy to share the code for the entire algorithm - just let me
know and I will post it here.

Thomas Philips

d = {}                  #identify unique instances of x and y
for xx,yy in zip(x,y):
if xx in d:
d[xx].append(yy)
else:
d[xx] = [yy]

x1 = []                 #unique instances of x and y
y1 = []                 #median(y) for each unique value of x
for xx,yy in d.iteritems():
x1.append(xx)
l = len(yy)
if l == 1:
y1.append(yy)
else:
yy.sort()
y1.append( (yy[l//2-1] + yy[l//2])/2.0 if l % 2 == 0 else
yy[l//2] )

```