[SciPy-user] scipy.optimize.leastsq and covariance matrix meaning

Wed Nov 12 09:51:11 EST 2008

Robert Kern wrote:
> On Tue, Nov 11, 2008 at 06:15, massimo sandal <massimo.sandal at unibo.it> wrote:
>   
>> Robert Kern wrote:
>>     
>>> On Mon, Nov 10, 2008 at 05:13, massimo sandal <massimo.sandal at unibo.it>
>>> wrote:
>>>       
>>>> massimo sandal wrote:
>>>>
>>>>         
>>>>> I'll try to sketch up a script reproducing the core of the problem with
>>>>> actual data.
>>>>>           
>>>> Here it is. Can anyone give it a look to help me understand if and how to
>>>> make sense of the covariance matrix?
>>>>         
>>> The covariance matrix does need some scaling before it can be
>>> interpreted statistically. Basically, if you are doing nonlinear least
>>> squares as a statistical procedure, rather than a purely numerical
>>> one, the residuals need to be scaled so that they are in units of
>>> standard deviations of the measurement error for each individual
>>> measurement. If you don't know what that is, then you can estimate it
>>> from the fitted residuals. The parameter estimate is unchanged, but
>>> you will need to rescale the covariance matrix of the estimate by
>>> multiplying it by the residual variance.
>>>
>>> scipy.odr does most of this for you. Attached is a version of your
>>> code using scipy.odr. Here is the text output:
>>>
>>> Fitted parameters: [  4.90666526e+06   4.78090340e+09]
>>> Covariance: [[  1.72438988e+31  -1.64258997e+35]
>>>  [ -1.64258997e+35   1.57791262e+39]]
>>> Residual variance: 2.83606592894e-22
>>> Scaled error bars: [  6.99319913e+04   6.68959208e+08]
>>> Scaled covariance: [[  4.89048340e+09  -4.65849344e+13]
>>>  [ -4.65849344e+13   4.47506422e+17]]
>>>       
>> Thanks a lot! What I need are the scaled error bars, isn't it?
>>     
>
> You have a high anti-correlation (-0.996), so it is very much worth
> reporting the entire (scaled) covariance matrix. At least report the
> scaled error bars and the correlation factor.
>   
You also have a very bad fit as the standard errors are huge relative to 
the estimate meaning your parameters are not statistically different 
from zero. Physics is one thing but the data tell a very different story 
perhaps due to measurement errors. The linear regression of x on y gives 
a residual variance of  2.86121E-22 and R-squared is 73% (approx 73% for 
the model above). I don't see any evidence for a nonlinear fit 
especially if you bother to plot the data. As I previously said, you 
probably would get a better fit using splines or similar because of the 
variability present.
>   
>> (By the way: any good tutorial reference/book on this kind of numerical
>> things? I am a molecular biologist now doing biophysics, and while enjoying
>> it a lot, I feel behind on a lot of technical stuff)
>>     
>
> Do you mean the statistical aspects of curve fitting? The book I
> learned this kind of stuff from is quite old, _Data Reduction and
> Error Analysis for the Physical Sciences_ by Bevington and Robinson,
> but it covers the practical basics of curve fitting and error analysis
> pretty well. Many statistics books cover similar ground, but speak a
> different language.
>
>   
There are many books on nonlinear modeling but these are usually focused 
to the authors experience and which specific aspect of nonlinear models. 
Bates and Watts 'Nonlinear Regression Analysis and Its Applications' is one.

Bruce