[Tutor] Traversing lists or getting the element you want.
Peter Otten
__peter__ at web.de
Mon Feb 3 14:35:42 CET 2014
Kipton Moravec wrote:
> I am new to Python, and I do not know how to traverse lists like I
> traverse arrays in C. This is my first program other than "Hello World".
> I have a Raspberry Pi and they say Python is the language of choice for
> that little machine. So I am going to try to learn it.
>
>
> I have data in the form of x, y pairs where y = f(x) and is non linear.
> It comes from a .csv file.
>
> In this case x is an integer from 165 to 660 so I have 495 data sets.
>
> I need to find the optimal locations of three values of x to piecewise
> linear estimate the function.
import itertools
def boundaries(n, segments):
max_n = n-1
for inner in itertools.combinations(range(n), segments-2):
yield (0,) + inner + (max_n,)
def minimized(x, y, segments):
def total_error(b):
return sum(error(x, y, start, end) for start, end in zip(b, b[1:]))
return min(boundaries(len(x), segments), key=total_error)
def error(x, y, istart, iend):
errorsq = 0
for m in range(istart, iend):
lin_estimate = y[istart] + ((y[iend] - y[istart]) *
((x[m] - x[istart]) / (x[iend] - x[istart])))
d = lin_estimate - y[m]
errorsq += d*d
return errorsq
if __name__ == "__main__":
# generate dataset with N random (x, y) pairs
import random
N = 100
random.seed(42) # want reproducible result
data = [
(random.random()*10, random.random()*100-50.)
for _ in range(N)]
print minimized([x for x, y in data],
[y for x, y in data], 4)
As this is mostly number crunching the Python version is probably a lot
slower than the C code. I tried to move the inner loop into numpy with
import numpy
[...]
def error(x, y, istart, iend):
lin_estimate = y[istart] + ((y[iend] - y[istart]) *
((x[istart:iend] - x[istart]) / (x[iend] - x[istart])))
delta = lin_estimate - y[istart:iend]
return (delta*delta).sum()
[...]
print minimized(numpy.array([x for x, y in data]),
numpy.array([y for x, y in data]), 4)
but that had no noticeable effect. There may be potential gains for someone
willing to put in more work or with better knowledge of numpy.
By the way, your method of calculating the error
> double error(int istart, int iend)
> {
>
> // linear interpolation I can optimize but left
> // it not optimized for clarity of the function
>
> int m;
> double lin_estimate;
> double errorsq;
> errorsq = 0;
>
> for (m=istart; m<iend; m++)
> {
> lin_estimate = y[istart] + ((y[iend] – y[istart]) *
> ((x[m] – x[istart]) / (x[iend] – x[istart])));
>
> errorsq += (lin_estimate – y[m]) * (lin_estimate – y[m]);
> }
> return (errorsq);
> }
(which I adopted) looks odd. With a tried and tested method like "least
square fit" which does not require the line (segments) to go through any
point of the dataset you should get better fits.
PS: Caveat emptor! I did not even plot a graph with the result to check
plausibility; there may be embarassing errors.
More information about the Tutor
mailing list