----- Original Message -----
On Thu, Sep 22, 2011 at 1:23 PM, Gianfranco Durin <g.durin@inrim.it> wrote:
Dear all, I wanted to briefly show you the results of an analysis I did on the performances of the optimize.leastsq method for data fitting. I presented these results at the last Python in Physics workshop. You can download the pdf here: http://emma.inrim.it:8080/gdurin/talks.
1. The main concern is about the use of cov_x to estimate the error bar of the fitting parameters. In the docs, it is set that "this matrix must be multiplied by the residual standard deviation to get the covariance of the parameter estimates -- see curve_fits.""
Unfortunately, this is not correct, or better it is only partially correct. It is correct if there are no error bars of the input data (the sigma of curve_fit is None). But if provided, they are used as "as weights in least-squares problem" (curve_fit doc), and cov_x gives directly the covariance of the parameter estimates (i.e. the diagonal terms are the errors in the parameters). See for instance here: http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of....
This means that not only the doc needs fixing, but also the curve_fit code, those estimation of the parameters' error is INDEPENDENT of the values of the data errors in the case they are constant, which is clearly wrong. I have never provided a patch, but the fix should be quite simple, just please give me indication on how to do that.
Since this depends on what you define as weight or sigma, both are correct but use different definitions.
Of course, but this is not the point. Let me explain. In our routines, the cov_x is calculated as (J^T * J) ^-1, where J is the jacobian. The Jacobian, unless provided directly, is calculated using the definition of the residuals, which in curve_fit method are "_general_function", and "_weighted_general_function". The latter function explicitly uses the weights, so the cov_x is more precisely (J^T W J) ^-1, where W is the matrix of the weights, and J is just the matrix of the first derivatives. Thus in this case, the diagonal elements of cov_x give the variance of the parameters. No need to multiply by the residual standard deviation. In case of W == 1, i.e. no errors in the data are provided, as reported here http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29#Weighted... one uses the variance of the observations, i.e. uses the s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0)) as an estimate of the variance, as done in curve_fit. BUT, we cannot multiply the cov_x obtained with the _weighted_general_function by s_sq. As I told, we already took it into account in the definition of the residuals. Thus: if (len(ydata) > len(p0)) and pcov is not None: s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0)) pcov = pcov * s_sq else: pcov = inf where func can be both "_general_function" and "_weighted_general_function", is not correct.
Since we just had this discussion, I'm not arguing again. I just want to have clear definitions that the "average" user can use by default. I don't really care which it is if the majority of users are engineers who can tell what their errors variances are before doing any estimation.
Oh, interesting. Where did you have this discussion? On this list? I could not find it... Here the problem is not to decide an 'average' behaviour, but to correctly calculate the parameters' error when the user does or does not provide the errors in the data. Hope this helps Gianfranco