Finding subsets for a robust regression

tkpmep at hotmail.com tkpmep at hotmail.com
Mon Sep 29 20:59:27 CEST 2008


I have coded a robust (Theil-Sen) regression routine which takes as
inputs two lists of numbers, x and y, and returns a robust estimate of
the slope and intercept of the best robust straight line fit.

In a pre-processing phase, I create two new lists, x1 and y1; x1 has
only the unique values in x, and for each unique value in x1, y1 has
the median of all such values in x. My code follows, and it seems a
bit clumsy - is there a cleaner way to do it? By the way, I'd be more
than happy to share the code for the entire algorithm - just let me
know and I will post it here.

Thanks in advance

Thomas Philips

    d = {}                  #identify unique instances of x and y
    for xx,yy in zip(x,y):
        if xx in d:
            d[xx].append(yy)
        else:
            d[xx] = [yy]

    x1 = []                 #unique instances of x and y
    y1 = []                 #median(y) for each unique value of x
    for xx,yy in d.iteritems():
        x1.append(xx)
        l = len(yy)
        if l == 1:
            y1.append(yy[0])
        else:
            yy.sort()
            y1.append( (yy[l//2-1] + yy[l//2])/2.0 if l % 2 == 0 else
yy[l//2] )



More information about the Python-list mailing list