Colleen Glaeser songbird42371 at gmail.com
Fri Oct 15 06:09:57 CEST 2010

```Dear tutors,

I am in a beginning-level computer science class in college and am running
into problems with an assignment.

The assignment is as follows:

Statisticians are fond of drawing regression lines.  In statistics and other
fields where people analyze lots of data, one of the most commonly used
regression lines is called the “least squares line.” This is the line that
is supposed to best fit the set of data points, in the sense that it
minimizes the squared vertical distances between the points and the line.  Why
this should be a good fit is beyond the scope of this assignment.

Presume that you have a collection of n two-dimensional data points.  I’ll
give it as a list of lists, where each of the lists inside represents one
data point.

Data :[ [x1, y1], [x2, y2], [x3, y3], …, [xn, yn]]

Compute the following

The regression line is then given by

where m and b may be obtained by

and

Your task is to compute the m and b (slope and intercept, respectively) for
a set of data.  You have to analyze the data as given, not count or add
anything yourself.  Your program should do everything, even figure out how
many data points there are.

First set:  [ [3, 1], [4, 3], [6, 4], [7, 6], [8, 8], [9, 8] ]

Second set:  [ [63, 11], [22, 7.5], [63, 11], [28, 10], [151, 12], [108,
10], [18, 8], [115, 10], [31,7], [44, 9] ]

Find m and b, then calculate an estimate for x = 5 using the first data set.
That is, plug in 5 for x and see what y you get.  For the second set, try x
= 95.

Turn in:  code, m, b, and the estimates for both data sets.

***********************************************************************************************************************

There’s an easy way to walk through the data and extract the values you
need.  Use a for loop.  Try this:

for item in data:

[x, y] = item

print(x)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For extra credit:  draw a scatter plot of the data, and draw in the least
squares line.  Scale the window to fit, making it a bit wider and higher
than the data requires, so that some of the points are near but not on the
edges of the window.  Then sketch in the regression line.  Note that you
should calculate the window size based on the data – don’t set them
yourself; find the max and min values for x and y.  You can print the
scatter plot, or point me toward your web page.  In any case, show me the
code.

So far, my program is as follows:

Data = [[3,1],[4,3],[6, 4],[7, 6],[8, 8],[9, 8]]

def X():
accX = 0
for item in Data:
[x,y] = item

accX = accX + x
print (accX)

def Y():
accY = 0
for item in Data:
[x,y] = item

accY = accY + y
print (accY)

def P():
accXY = 0
for item in Data:
[x,y] = item

accXY = accXY + (y*x)
print (accXY)

def Q():
accX2 = 0
for item in Data:
[x,y] = item

accX2 = accX2 + (x**2)
print (accX2)

X()
Y()
P()
Q()

def B():
((Y() * Q()) - (P() * X())) / ((6 * Q()) - (X()**2))

def M():
((Y() * Q()) - (P() * X())) / (X() * Q())

B()
M()

Now, my functions for X, Y, P, and Q are correct, but I have a couple of
problems when it comes to continuing.  First of all, despite what my teacher
has told me, my method for trying to multiply X,Y,P, and Q's results in the
functions for B and M are not working.  I'm not sure if there is a way to
make functions into variables or how to solve this problem.

Second, I am confused as to what my teacher means to do when it comes to
inputting different values of x.

Find m and b, then calculate an estimate for x = 5 using the first data set.
That is, plug in 5 for x and see what y you get.  For the second set, try x
= 95.

Turn in:  code, m, b, and the estimates for both data sets.

I mean, I know I need to calculate the line of best fit for the data sets
using B and M, but what in the world is x supposed to do and where does it
go?  How do I program this?  This is especially harder since I've never
taken a proper stat class before.

Thank you all so much!

--
Colleen Glaeser
songbird42371 at gmail.com
636.357.8519
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20101014/0b1ec91e/attachment-0001.html>
```