Linear regression in NumPy
robert.kern at gmail.com
Fri Mar 17 22:07:57 CET 2006
> I'm a little bit stuck with NumPy here, and neither the docs nor
> trial&error seems to lead me anywhere:
> I've got a set of data points (x/y-coordinates) and want to fit a
> straight line through them, using LMSE linear regression. Simple
> enough. I thought instead of looking up the formulas I'd just see if
> there isn't a NumPy function that does exactly this. What I found was
> "linear_least_squares", but I can't figure out what kind of parameters
> it expects: I tried passing it my array of X-coordinates and the array
> of Y-coordinates, but it complains that the first parameter should be
> two-dimensional. But well, my data is 1d. I guess I could pack the X/Y
> coordinates into one 2d-array, but then, what do I do with the second
> Mor generally: Is there any kind of documentation that tells me what
> the functions in NumPy do, and what parameters they expect, how to call
> them, etc. All I found was:
> "This function returns the least-squares solution of an overdetermined
> system of linear equations. An optional third argument indicates the
> cutoff for the range of singular values (defaults to 10-10). There are
> four return values: the least-squares solution itself, the sum of the
> squared residuals (i.e. the quantity minimized by the solution), the
> rank of the matrix a, and the singular values of a in descending
> It doesn't even mention what the parameters "a" and "b" are for...
Look at the docstring. (Note: I am using the current version of numpy from SVN,
you may be using an older version of Numeric. http://numeric.scipy.org/)
In : numpy.linalg.lstsq?
Base Class: <type 'function'>
String Form: <function linear_least_squares at 0x1677630>
Definition: numpy.linalg.lstsq(a, b, rcond=1e-10)
where x minimizes 2-norm(|b - Ax|)
resids is the sum square residuals
rank is the rank of A
s is the rank of the singular values of A in descending order
If b is a matrix then x is also a matrix with corresponding columns.
If the rank of A is less than the number of columns of A or greater than
the number of rows, then residuals will be returned as an empty array
otherwise resids = sum((b-dot(A,x)**2).
Singular values less than s*rcond are treated as zero.
robert.kern at gmail.com
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list