[SciPy-user] incorrect variance in optimize.curvefit and leastsq
josef.pktd at gmail.com
josef.pktd at gmail.com
Thu Feb 12 23:57:58 EST 2009
On Thu, Feb 12, 2009 at 11:09 PM, Travis E. Oliphant
<oliphant at enthought.com> wrote:
> josef.pktd at gmail.com wrote:
>> I just saw the new optimize.curvefit which provides a wrapper around
>> optimize.leastsq
>>
>> optimize.leastsq provides the raw covariance matrix (cov_x). As I
>> mentioned once on the mailing list, this is not the covariance matrix
>> of the parameter estimates. To get those, the raw covariance matrix
>> has to be multiplied by the standard error of the residual. So, the
>> docstring in optimize.curvefit doesn't correspond to the actual
>> calculation.
>>
> Thank you for the clarification. I had forgotten your earlier valid
> concerns. Help fixing the docstring is appreciated. If you can
> figure out how to improve the code, that is even better. I think it is
> good to at least report the cov, but the docstring should not mislead.
>>
the standard deviation of the error can be calculated and the
corrected (this is written for the use from outside of curvefit):
yhat = func(x,popt[0], popt[1]) # get predicted observations
SSE = np.sum((y-yhat)**2)
sig2 = SSE/(len(y)-len(popt))
ecov = sig2*pcov # this is the variance-covariance matrix of
the parameter estimates
inside curvefit, this work (before the return):
err = func(popt, *args)
SSE = np.sum((err)**2)
sig2 = SSE / (len(ydata) - len(popt))
pcov = sig2 * pcov
>> The first parameter is not exactly high precision.
>>
>> The second problem is that, in weighted least squares, the calculation
>> of the standard deviation of the parameter estimates has to take the
>> weights into account. (But I don't have the formulas right now.)
>>
>> I was looking at this to provide a general non-linear least squares
>> class in stats. But for several calculation, the Jacobian would be
>> necessary. optimize.leastsq only provides cov_x, but I was wondering
>> whether the Jacobian can be calculated from the return of the minpack
>> functions in optimize.leastsq, but I didn't have time to figure this
>> out.
>>
> I'm not sure, but it might be. I would love to spend time on this,
> but don't have it. If somebody else can pick up, that would be great.
>
> -Travis
>
Below are two versions of the test function, the first is against
curvefit with corrected pcov, the second is a test against an
uncorrected curvefit. It uses only decimal=5 so the tests don't fail
Josef
----------
test against corrected version
----------
def test_curvefit():
'''test against NIST certified case at
http://www.itl.nist.gov/div898/strd/nls/data/misra1b.shtml'''
data = array([[ 10.07, 77.6 ],
[ 14.73, 114.9 ],
[ 17.94, 141.1 ],
[ 23.93, 190.8 ],
[ 29.61, 239.9 ],
[ 35.18, 289. ],
[ 40.02, 332.8 ],
[ 44.82, 378.4 ],
[ 50.76, 434.8 ],
[ 55.05, 477.3 ],
[ 61.01, 536.8 ],
[ 66.4 , 593.1 ],
[ 75.47, 689.1 ],
[ 81.78, 760. ]])
pstd_c = [3.1643950207E+00, 4.2547321834E-06]
popt_c = [3.3799746163E+02, 3.9039091287E-04]
SSE_c = 7.5464681533E-02
Rstd_c = 7.9301471998E-02
decimal = 5 #accuracy of parameter estimate and standard deviation
def funct(x, b1, b2):
return b1 * (1-(1+b2*x/2)**(-2))
start1 = [500, 0.0001]
start2 = [300, 0.0002]
for start in [start1, start2]:
popt, pcov = curve_fit(funct, x, y, p0=start)
pstd = np.sqrt(np.diag(pcov))
assert_almost_equal(popt, popt_c, decimal=decimal)
assert_almost_equal(pstd, pstd_c, decimal=decimal)
-------------------
test against current version:
--------------------------------------
from numpy.testing import assert_almost_equal
def test_curvefit_old():
'''test against NIST certified case at
http://www.itl.nist.gov/div898/strd/nls/data/misra1b.shtml'''
data = array([[ 10.07, 77.6 ],
[ 14.73, 114.9 ],
[ 17.94, 141.1 ],
[ 23.93, 190.8 ],
[ 29.61, 239.9 ],
[ 35.18, 289. ],
[ 40.02, 332.8 ],
[ 44.82, 378.4 ],
[ 50.76, 434.8 ],
[ 55.05, 477.3 ],
[ 61.01, 536.8 ],
[ 66.4 , 593.1 ],
[ 75.47, 689.1 ],
[ 81.78, 760. ]])
pstd_c = [3.1643950207E+00, 4.2547321834E-06]
popt_c = [3.3799746163E+02, 3.9039091287E-04]
SSE_c = 7.5464681533E-02
Rstd_c = 7.9301471998E-02
decimal = 5 #accuracy of parameter estimate and standard deviation
def funct(x, b1, b2):
return b1 * (1-(1+b2*x/2)**(-2))
start1 = [500, 0.0001]
start2 = [300, 0.0002]
for start in [start1, start2]:
popt, pcov = curve_fit(funct, x, y, p0=start)
yest = funct(x,popt[0], popt[1])
SSE = np.sum((y-yest)**2)
dof = len(y)-len(popt)
#Residual standard error
Rstd = np.sqrt(SSE/(len(y)-len(popt)))
#parameter standard error
pstd = np.sqrt(SSE/(len(y)-len(popt))*np.diag(pcov))
assert_almost_equal(popt, popt_c, decimal=decimal)
assert_almost_equal(pstd, pstd_c, decimal=decimal)
assert_almost_equal(SSE, SSE_c)
assert_almost_equal(Rstd, Rstd_c)
More information about the SciPy-User
mailing list