[Numpy-discussion] polyfit on multiple data points

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Apr 13 22:06:45 EDT 2009


On Mon, Apr 13, 2009 at 5:59 PM, Mathew Yeates <myeates at jpl.nasa.gov> wrote:
> Hi,
> I understand how to fit  the points (x1,y1) (x2,y2),(x3,y3) with a line
> using polyfit. But, what if I want to perform this task on every row of
> an array?
> For instance
>
> [[x1,x2,x3],
>  [s1,s2,s3]]
>
> [[y1,y2,y3,],
>  [r1,r2,r3]]
>
> and I want the results to be the coefficients  [a,b,c]  and [d,e,f] where
> [a,b,c] fits the points (x1,y1) (x2,y2),(x3,y3) and
> [d,e,f] fits the points (s1,r1) (s2,r2),(s3,r3)
>
> I realize I could use "apply_along_axis" but I'm afraid of the
> performance penalty. Is there a way to do this without resorting to a
> function call for each row?
>
> Mathew

Which order of polynomial are you fitting to each 3 points?

If you want to have independent polynomials, the only other way I can
see is building a block design matrix, essentially stacking all
independent regression into one big least squares problem. The
performance will depend on the number of points you have, but I doubt
it will be faster to build the design matrix and solve the large least
squares problem then the loop.

For a low order polynomial, it might be possible to get an explicit
solution, easy for linear, I never tried for ols with two regressors,
maybe sympy would help.

If you really only have 3 points, then with two regressors and a
constant, you would already have an exact solution with a second order
polynomial, 3 equations in 3 unknowns. In this case, you can write an
explicit solution where it might be possible to use only vectorized
array manipulation.

If you need the loop and you have fixed axis, doing the loop yourself
should be faster then "apply_along_axis".

I haven't tried any of this myself, but that's where I would be
looking, if runtime is more expensive than developer time and you
don't need a solution for the general case with more points and higher
order polynomials.

------
Rereading your email, I interpret you want a line, 1st order
polynomial, and the predicted values of the dependent variable. That's
just a linear regression, you can just use the formula for slope and
constant, see for example stats.linregress.

Something like the following: assuming x, and y is mean corrected
array with your 3 points in columns, and rows are different cases,
both are (n,3) arrays, then the slope coefficient should be

b = (x*y).sum(1) / (x,x).sum(1)
constant is something like:  a = y.mean(axis=1) - b * x.mean(1)
predicted points  ypred = a[:,np.newaxis] + x*a[:,np.newaxis]
(should be (n,3) array)

warning: not tested, written from memory

Josef



More information about the NumPy-Discussion mailing list