[Tutor] Traversing lists or getting the element you want.

Peter Otten __peter__ at web.de
Mon Feb 3 14:35:42 CET 2014


Kipton Moravec wrote:

> I am new to Python, and I do not know how to traverse lists like I
> traverse arrays in C. This is my first program other than "Hello World".
> I have a Raspberry Pi and they say Python is the language of choice for
> that little machine. So I am going to try to learn it.
> 
> 
> I have data in the form of x, y pairs where y = f(x) and is non linear.
> It comes from a .csv file.
> 
> In this case x is an integer from 165 to 660 so I have 495 data sets.
> 
> I need to find the optimal locations of three values of x to piecewise
> linear estimate the function.

import itertools

def boundaries(n, segments):
    max_n = n-1
    for inner in itertools.combinations(range(n), segments-2):
        yield (0,) + inner + (max_n,)
 
def minimized(x, y, segments):
    def total_error(b):
        return sum(error(x, y, start, end) for start, end in zip(b, b[1:]))
    return min(boundaries(len(x), segments), key=total_error)

def error(x, y, istart, iend):
    errorsq = 0
    for m in range(istart, iend):
        lin_estimate = y[istart] + ((y[iend] - y[istart]) * 
                       ((x[m] - x[istart]) / (x[iend] - x[istart])))
        d = lin_estimate - y[m]
        errorsq += d*d
    return errorsq

if __name__ == "__main__":
    # generate dataset with N random (x, y) pairs
    import random
    N = 100
    random.seed(42) # want reproducible result
    data = [
        (random.random()*10, random.random()*100-50.)
        for _ in range(N)]


    print minimized([x for x, y in data],
                    [y for x, y in data], 4)


As this is mostly number crunching the Python version is probably a lot 
slower than the C code. I tried to move the inner loop into numpy with
import numpy

[...]

def error(x, y, istart, iend):
    lin_estimate = y[istart] + ((y[iend] - y[istart]) *
                   ((x[istart:iend] - x[istart]) / (x[iend] - x[istart])))
    delta = lin_estimate - y[istart:iend]
    return (delta*delta).sum()

[...]

    print minimized(numpy.array([x for x, y in data]),
                    numpy.array([y for x, y in data]), 4)

but that had no noticeable effect. There may be potential gains for someone 
willing to put in more work or with better knowledge of numpy.


By the way, your method of calculating the error 

> double error(int istart, int iend)
> {
> 
> // linear interpolation I can optimize but left
> // it not optimized for clarity of the function
> 
>     int m;
>     double lin_estimate;
>     double errorsq;
>     errorsq = 0;
> 
>     for (m=istart; m<iend; m++)
>     {
>         lin_estimate = y[istart] + ((y[iend] – y[istart]) *
>                        ((x[m] – x[istart]) / (x[iend] – x[istart])));
> 
>         errorsq += (lin_estimate – y[m]) * (lin_estimate – y[m]);
>     }
>     return (errorsq);
> }

(which I adopted) looks odd. With a tried and tested method like "least 
square fit" which does not require the line (segments) to go through any 
point of the dataset you should get better fits.

PS: Caveat emptor! I did not even plot a graph with the result to check 
plausibility; there may be embarassing errors.



More information about the Tutor mailing list