[SciPy-User] How to fit data with errorbars

Nathaniel Smith njs at pobox.com
Tue Feb 16 22:24:59 EST 2010


On Tue, Feb 16, 2010 at 10:08 AM, Jeremy Conlin <jlconlin at gmail.com> wrote:
> I have some data with some errobars that I need to fit to a line.  Is
> there anyway to use scipy.polyfit with the error associated with the
> data?  If not, how can I make a fit routine work with my data?

If the error is just in the y (dependent) variable, then this may be
quite simple. If your y values:
  -- have a normally distributed error
  -- whose standard deviation varies from point to point
  -- but the standard deviations are known (up to a multiplicative constant)
then it's a classic problem -- perhaps google "heterskedasticity
correction" or "weighted least squares" or "generalized least
squares". But basically, just make a model matrix X that has one row
for each observation, and one column for each predictor. If you want
to do, say, third-order polynomial fitting and you have two vectors of
independent variables x1 and x2 and you want to include an intercept,
then it's something like (all untested):

X = np.column_stack([np.ones(len(x1)), x1, x1 ** 2, x1 ** 3, x2, x2 **
2, x2 ** 3])

Now a normal least squares fit, ignoring the errorbars, would be:

beta = np.linalg.lstsq(X, y)[0]

with predicted values np.dot(X, beta).

If you have the standard deviation for each measurement in a vector
stddevs, then the proper heteroskedasticity-corrected fit is just:

beta = np.linalg.lstsq((1 / stddevs) * X, (1 / stddevs) * y)[0]

-- Nathaniel



More information about the SciPy-User mailing list