Re: [SciPy-Dev] On the leastsq/curve_fit method

Sept. 27, 2011

      On Tue, Sep 27, 2011 at 9:20 AM, <josef.pktd@gmail.com> wrote:
...
On Tue, Sep 27, 2011 at 5:17 AM, Gianfranco Durin <g.durin@inrim.it>wrote:
...
...
where func can be both "_general_function" and
"_weighted_general_function", is not correct.
M = σ 2 I ok unit weights
M = W^(-1) your case, W has the estimates of the error covariance
M = σ 2 W^(-1) I think this is what curve_fit uses, and what is in
(my) textbooks defined as weighted least squares
Do we use definition 2 or 3 by default? both are reasonable
3 is what I expected when I looked at curve_fit
2 might be more useful for two stage estimation, but I didn't have
time to check the details
Ehmm, no, curve_fit does not use def 3, as the errors would scale with W,
but they don't. By the way, it does not have the correct units.
http://wwwasdoc.web.cern.ch/wwwasdoc/minuit/node31.html
'''
The minimization of [image: $ \chi^{2}_{}$] above is sometimes called *weighted
least  squares* in which case the inverse quantities 1/*e*2 are called the
weights. Clearly this is simply a different word for the same thing, but in
practice the use of these words sometimes means that the interpretation of
*e*2 as variances or squared errors is not straightforward. The word
weight often implies that only the relative weights are known (``point two
is twice as important as point one'') in which case there is apparently an
unknown overall normalization factor. Unfortunately the parameter errors
coming out of such a fit will be proportional to this factor, and the user
must be aware of this in the formulation of his problem.
'''
(I don't quite understand the last sentence.)
M = σ^2 W^(-1), where σ^2 is estimated by residual sum of squares from
weighted regression.
W only specifies relative errors, the assumption is that the covariance
matrix of the errors is *proportional* to W. The scaling is arbitrary. If
the scale of W changes, then the estimated residual sum of squares from
weighted regression will compensate for it. So, rescaling W has no effect on
the covariance of the parameter estimates.
I checked in Greene: Econometric Analysis, and briefly looked at the SAS
description. It looks like weighted least squares is always with automatic
scaling, W is defined only as relative weights.
All I seem to be able to find is weighted least squares with automatic
scaling (except for maybe some two-step estimators).
...
Curve_fit calculates
M = W \sigma^2 W^(-1) = \sigma^2
If I remember correctly (calculated from the transformed model) it should
be:
the cov of the parameter estimates is s^2 (X'WX)^(-1)
error estimates should be s^2 * W
where W = diag(1/curvefit_sigma**2)   unfortunate terminology for
curve_fit's sigma or intentional ? (as I mentioned in the other thread)
Josef
...
so it gives exactly the same results of case 1, irrespective the W's. This
is why the errors do not scale with W.
Gianfranco
_______________________________________________
Gianfranco,
Can you please provide some Python and R (or SAS) code to show what you
mean?

In the linear example below, I get agreement between SAS and curve_fit with
and without defining a weight. I do know that the immediate result from
leastsq does not agree with values from SAS. Thus, with my limited
knowledge, I consider that the documentation is correct.

import numpy as np
from scipy.optimize import curve_fit
from scipy.optimize import leastsq
x=np.array([200, 400, 300, 400, 200, 300, 300, 400, 200, 400, 200, 300],
dtype=float)
y=np.array([28, 75, 37, 53, 22, 58, 40, 96, 34, 52, 30, 69], dtype=float)
w=np.array([0.001168822, 0.000205547, 0.000408122, 0.000205547, 0.001168822,
0.000408122, 0.000408122, 0.000205547, 0.001168822, 0.000205547,
0.001168822, 0.000408122])
sw=1/np.sqrt(w)

def linfunc(x, a, b):
    return a + b*x
popt, pcov = curve_fit(linfunc, x, y)
wopt, wcov = curve_fit(linfunc, x, y, sigma=sw)

Output:
No weighted form usage of curve_fit which matches SAS:
a= -11.25 with SE= 15.88585587
b=0.2025 with SE=0.05109428

Weighted form if I understand the difference between how weights need to be
specified so the results match SAS:
a= -12.91438  with SE= 11.09325
b=0.20831 with SE= 0.04342

Bruce

Re: [SciPy-Dev] On the leastsq/curve_fit method

Bruce Southey