On the leastsq/curve_fit method
Dear all, I wanted to briefly show you the results of an analysis I did on the performances of the optimize.leastsq method for data fitting. I presented these results at the last Python in Physics workshop. You can download the pdf here: http://emma.inrim.it:8080/gdurin/talks. 1. The main concern is about the use of cov_x to estimate the error bar of the fitting parameters. In the docs, it is set that "this matrix must be multiplied by the residual standard deviation to get the covariance of the parameter estimates -- see curve_fits."" Unfortunately, this is not correct, or better it is only partially correct. It is correct if there are no error bars of the input data (the sigma of curve_fit is None). But if provided, they are used as "as weights in least-squares problem" (curve_fit doc), and cov_x gives directly the covariance of the parameter estimates (i.e. the diagonal terms are the errors in the parameters). See for instance here: http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of.... This means that not only the doc needs fixing, but also the curve_fit code, those estimation of the parameters' error is INDEPENDENT of the values of the data errors in the case they are constant, which is clearly wrong. I have never provided a patch, but the fix should be quite simple, just please give me indication on how to do that. 2. The convergence of the fit in the most difficult cases (see page 15 of my presentation) can required up to about 3000 iterations, reduced to 800 when using analytical derivatives. For quite a long time I did not realized that the fit needed more iterations that the number set by maxfev, and thus I started to think that the leastsq was not good enough for 'hard' data. As a matter of fact, maxfev : int The maximum number of calls to the function. If zero, then 100*(N+1) is the maximum where N is the number of elements in x0. is pretty low, so I suggest to increase the prefactor 100 to 1000. A relatively huge number is not a problem, by the way, because if the system is sloppy. i.e. one parameter does not move too much, the routine stops and complains with "Both actual and predicted relative reductions in the sum of squares are at most 0.000000 and the relative error between two consecutive iterates is at most 0.000000" as in the case of boxBod at pg. 15. By the way, we should also advice that the in case of analytical derivative this number is half, even if I personally would keep the same number for both cases. This is all for the moment. Many thanks for your attention, and sorry for the long mail Gianfranco
On Thu, Sep 22, 2011 at 1:23 PM, Gianfranco Durin <g.durin@inrim.it> wrote:
Dear all, I wanted to briefly show you the results of an analysis I did on the performances of the optimize.leastsq method for data fitting. I presented these results at the last Python in Physics workshop. You can download the pdf here: http://emma.inrim.it:8080/gdurin/talks.
1. The main concern is about the use of cov_x to estimate the error bar of the fitting parameters. In the docs, it is set that "this matrix must be multiplied by the residual standard deviation to get the covariance of the parameter estimates -- see curve_fits.""
Unfortunately, this is not correct, or better it is only partially correct. It is correct if there are no error bars of the input data (the sigma of curve_fit is None). But if provided, they are used as "as weights in least-squares problem" (curve_fit doc), and cov_x gives directly the covariance of the parameter estimates (i.e. the diagonal terms are the errors in the parameters). See for instance here: http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of....
This means that not only the doc needs fixing, but also the curve_fit code, those estimation of the parameters' error is INDEPENDENT of the values of the data errors in the case they are constant, which is clearly wrong. I have never provided a patch, but the fix should be quite simple, just please give me indication on how to do that.
Since this depends on what you define as weight or sigma, both are correct but use different definitions. Since we just had this discussion, I'm not arguing again. I just want to have clear definitions that the "average" user can use by default. I don't really care which it is if the majority of users are engineers who can tell what their errors variances are before doing any estimation. Josef
2. The convergence of the fit in the most difficult cases (see page 15 of my presentation) can required up to about 3000 iterations, reduced to 800 when using analytical derivatives. For quite a long time I did not realized that the fit needed more iterations that the number set by maxfev, and thus I started to think that the leastsq was not good enough for 'hard' data.
As a matter of fact,
maxfev : int The maximum number of calls to the function. If zero, then 100*(N+1) is the maximum where N is the number of elements in x0.
is pretty low, so I suggest to increase the prefactor 100 to 1000. A relatively huge number is not a problem, by the way, because if the system is sloppy. i.e. one parameter does not move too much, the routine stops and complains with "Both actual and predicted relative reductions in the sum of squares are at most 0.000000 and the relative error between two consecutive iterates is at most 0.000000" as in the case of boxBod at pg. 15.
By the way, we should also advice that the in case of analytical derivative this number is half, even if I personally would keep the same number for both cases.
This is all for the moment.
Many thanks for your attention, and sorry for the long mail
Gianfranco _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
----- Original Message -----
On Thu, Sep 22, 2011 at 1:23 PM, Gianfranco Durin <g.durin@inrim.it> wrote:
Dear all, I wanted to briefly show you the results of an analysis I did on the performances of the optimize.leastsq method for data fitting. I presented these results at the last Python in Physics workshop. You can download the pdf here: http://emma.inrim.it:8080/gdurin/talks.
1. The main concern is about the use of cov_x to estimate the error bar of the fitting parameters. In the docs, it is set that "this matrix must be multiplied by the residual standard deviation to get the covariance of the parameter estimates -- see curve_fits.""
Unfortunately, this is not correct, or better it is only partially correct. It is correct if there are no error bars of the input data (the sigma of curve_fit is None). But if provided, they are used as "as weights in least-squares problem" (curve_fit doc), and cov_x gives directly the covariance of the parameter estimates (i.e. the diagonal terms are the errors in the parameters). See for instance here: http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of....
This means that not only the doc needs fixing, but also the curve_fit code, those estimation of the parameters' error is INDEPENDENT of the values of the data errors in the case they are constant, which is clearly wrong. I have never provided a patch, but the fix should be quite simple, just please give me indication on how to do that.
Since this depends on what you define as weight or sigma, both are correct but use different definitions.
Of course, but this is not the point. Let me explain. In our routines, the cov_x is calculated as (J^T * J) ^-1, where J is the jacobian. The Jacobian, unless provided directly, is calculated using the definition of the residuals, which in curve_fit method are "_general_function", and "_weighted_general_function". The latter function explicitly uses the weights, so the cov_x is more precisely (J^T W J) ^-1, where W is the matrix of the weights, and J is just the matrix of the first derivatives. Thus in this case, the diagonal elements of cov_x give the variance of the parameters. No need to multiply by the residual standard deviation. In case of W == 1, i.e. no errors in the data are provided, as reported here http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29#Weighted... one uses the variance of the observations, i.e. uses the s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0)) as an estimate of the variance, as done in curve_fit. BUT, we cannot multiply the cov_x obtained with the _weighted_general_function by s_sq. As I told, we already took it into account in the definition of the residuals. Thus: if (len(ydata) > len(p0)) and pcov is not None: s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0)) pcov = pcov * s_sq else: pcov = inf where func can be both "_general_function" and "_weighted_general_function", is not correct.
Since we just had this discussion, I'm not arguing again. I just want to have clear definitions that the "average" user can use by default. I don't really care which it is if the majority of users are engineers who can tell what their errors variances are before doing any estimation.
Oh, interesting. Where did you have this discussion? On this list? I could not find it... Here the problem is not to decide an 'average' behaviour, but to correctly calculate the parameters' error when the user does or does not provide the errors in the data. Hope this helps Gianfranco
I am just a simple user, but perhaps this gives some idea of what us users experience: I got so fed up with leastsq not having good documentation, not being able to set limits to the parameters to be fit, and not handling errors in input measurements in a transparent way, that I was very happy to replace it with the pkfit routine based on Craig Markwards IDL code. I am happy now, but I waisted a lot of time because of these leastsq issues. Anyway - I am happy now. Paul Kuin On Mon, Sep 26, 2011 at 2:57 PM, Gianfranco Durin <g.durin@inrim.it> wrote:
----- Original Message -----
On Thu, Sep 22, 2011 at 1:23 PM, Gianfranco Durin <g.durin@inrim.it> wrote:
Dear all, I wanted to briefly show you the results of an analysis I did on the performances of the optimize.leastsq method for data fitting. I presented these results at the last Python in Physics workshop. You can download the pdf here: http://emma.inrim.it:8080/gdurin/talks.
1. The main concern is about the use of cov_x to estimate the error bar of the fitting parameters. In the docs, it is set that "this matrix must be multiplied by the residual standard deviation to get the covariance of the parameter estimates -- see curve_fits.""
Unfortunately, this is not correct, or better it is only partially correct. It is correct if there are no error bars of the input data (the sigma of curve_fit is None). But if provided, they are used as "as weights in least-squares problem" (curve_fit doc), and cov_x gives directly the covariance of the parameter estimates (i.e. the diagonal terms are the errors in the parameters). See for instance here: http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of....
This means that not only the doc needs fixing, but also the curve_fit code, those estimation of the parameters' error is INDEPENDENT of the values of the data errors in the case they are constant, which is clearly wrong. I have never provided a patch, but the fix should be quite simple, just please give me indication on how to do that.
Since this depends on what you define as weight or sigma, both are correct but use different definitions.
Of course, but this is not the point. Let me explain.
In our routines, the cov_x is calculated as (J^T * J) ^-1, where J is the jacobian. The Jacobian, unless provided directly, is calculated using the definition of the residuals, which in curve_fit method are "_general_function", and "_weighted_general_function". The latter function explicitly uses the weights, so the cov_x is more precisely (J^T W J) ^-1, where W is the matrix of the weights, and J is just the matrix of the first derivatives. Thus in this case, the diagonal elements of cov_x give the variance of the parameters. No need to multiply by the residual standard deviation. In case of W == 1, i.e. no errors in the data are provided, as reported here http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29#Weighted...
one uses the variance of the observations, i.e. uses the
s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
as an estimate of the variance, as done in curve_fit.
BUT, we cannot multiply the cov_x obtained with the _weighted_general_function by s_sq. As I told, we already took it into account in the definition of the residuals. Thus:
if (len(ydata) > len(p0)) and pcov is not None: s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0)) pcov = pcov * s_sq else: pcov = inf
where func can be both "_general_function" and "_weighted_general_function", is not correct.
Since we just had this discussion, I'm not arguing again. I just want to have clear definitions that the "average" user can use by default. I don't really care which it is if the majority of users are engineers who can tell what their errors variances are before doing any estimation.
Oh, interesting. Where did you have this discussion? On this list? I could not find it...
Here the problem is not to decide an 'average' behaviour, but to correctly calculate the parameters' error when the user does or does not provide the errors in the data.
Hope this helps Gianfranco _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Mon, Sep 26, 2011 at 10:31 AM, Paul Kuin <npkuin@gmail.com> wrote:
I am just a simple user, but perhaps this gives some idea of what us users experience:
I got so fed up with leastsq not having good documentation, not being able to set limits to the parameters to be fit, and not handling errors in input measurements in a transparent way, that I was very happy to replace it with the pkfit routine based on Craig Markwards IDL code. I am happy now, but I waisted a lot of time because of these leastsq issues.
Anyway - I am happy now.
I'm a pretty happy user of scipy.optimize.leastsq because it is just a low level optimizer that I can use for many different models and because it is not a curve fitting function. Josef
Paul Kuin
----- Original Message -----
On Thu, Sep 22, 2011 at 1:23 PM, Gianfranco Durin <g.durin@inrim.it> wrote:
Dear all, I wanted to briefly show you the results of an analysis I did on the performances of the optimize.leastsq method for data fitting. I presented these results at the last Python in Physics workshop. You can download the pdf here: http://emma.inrim.it:8080/gdurin/talks.
1. The main concern is about the use of cov_x to estimate the error bar of the fitting parameters. In the docs, it is set that "this matrix must be multiplied by the residual standard deviation to get the covariance of the parameter estimates -- see curve_fits.""
Unfortunately, this is not correct, or better it is only partially correct. It is correct if there are no error bars of the input data (the sigma of curve_fit is None). But if provided, they are used as "as weights in least-squares problem" (curve_fit doc), and cov_x gives directly the covariance of the parameter estimates (i.e. the diagonal terms are the errors in the parameters). See for instance here:
http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of... .
This means that not only the doc needs fixing, but also the curve_fit code, those estimation of the parameters' error is INDEPENDENT of the values of the data errors in the case they are constant, which is clearly wrong. I have never provided a patch, but the fix should be quite simple, just please give me indication on how to do that.
Since this depends on what you define as weight or sigma, both are correct but use different definitions.
Of course, but this is not the point. Let me explain.
In our routines, the cov_x is calculated as (J^T * J) ^-1, where J is the jacobian. The Jacobian, unless provided directly, is calculated using the definition of the residuals, which in curve_fit method are "_general_function", and "_weighted_general_function". The latter function explicitly uses the weights, so the cov_x is more precisely (J^T W J) ^-1, where W is the matrix of the weights, and J is just the matrix of the first derivatives. Thus in this case, the diagonal elements of cov_x give the variance of the parameters. No need to multiply by the residual standard deviation. In case of W == 1, i.e. no errors in the data are provided, as reported here http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29#Weighted...
one uses the variance of the observations, i.e. uses the
s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
as an estimate of the variance, as done in curve_fit.
BUT, we cannot multiply the cov_x obtained with the _weighted_general_function by s_sq. As I told, we already took it into account in the definition of the residuals. Thus:
if (len(ydata) > len(p0)) and pcov is not None: s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0)) pcov = pcov * s_sq else: pcov = inf
where func can be both "_general_function" and "_weighted_general_function", is not correct.
Since we just had this discussion, I'm not arguing again. I just want to have clear definitions that the "average" user can use by default. I don't really care which it is if the majority of users are engineers who can tell what their errors variances are before doing any estimation.
Oh, interesting. Where did you have this discussion? On this list? I could not find it...
Here the problem is not to decide an 'average' behaviour, but to correctly calculate the parameters' error when the user does or does not
On Mon, Sep 26, 2011 at 2:57 PM, Gianfranco Durin <g.durin@inrim.it> wrote: provide the errors in the data.
Hope this helps Gianfranco _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Mon, Sep 26, 2011 at 9:57 AM, Gianfranco Durin <g.durin@inrim.it> wrote:
----- Original Message -----
On Thu, Sep 22, 2011 at 1:23 PM, Gianfranco Durin <g.durin@inrim.it> wrote:
Dear all, I wanted to briefly show you the results of an analysis I did on the performances of the optimize.leastsq method for data fitting. I presented these results at the last Python in Physics workshop. You can download the pdf here: http://emma.inrim.it:8080/gdurin/talks.
1. The main concern is about the use of cov_x to estimate the error bar of the fitting parameters. In the docs, it is set that "this matrix must be multiplied by the residual standard deviation to get the covariance of the parameter estimates -- see curve_fits.""
Unfortunately, this is not correct, or better it is only partially correct. It is correct if there are no error bars of the input data (the sigma of curve_fit is None). But if provided, they are used as "as weights in least-squares problem" (curve_fit doc), and cov_x gives directly the covariance of the parameter estimates (i.e. the diagonal terms are the errors in the parameters). See for instance here:
http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of... .
This means that not only the doc needs fixing, but also the curve_fit code, those estimation of the parameters' error is INDEPENDENT of the values of the data errors in the case they are constant, which is clearly wrong. I have never provided a patch, but the fix should be quite simple, just please give me indication on how to do that.
Since this depends on what you define as weight or sigma, both are correct but use different definitions.
Of course, but this is not the point. Let me explain.
In our routines, the cov_x is calculated as (J^T * J) ^-1, where J is the jacobian. The Jacobian, unless provided directly, is calculated using the definition of the residuals, which in curve_fit method are "_general_function", and "_weighted_general_function". The latter function explicitly uses the weights, so the cov_x is more precisely (J^T W J) ^-1, where W is the matrix of the weights, and J is just the matrix of the first derivatives. Thus in this case, the diagonal elements of cov_x give the variance of the parameters. No need to multiply by the residual standard deviation. In case of W == 1, i.e. no errors in the data are provided, as reported here http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29#Weighted...
one uses the variance of the observations, i.e. uses the
s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
as an estimate of the variance, as done in curve_fit.
BUT, we cannot multiply the cov_x obtained with the _weighted_general_function by s_sq. As I told, we already took it into account in the definition of the residuals. Thus:
if (len(ydata) > len(p0)) and pcov is not None: s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0)) pcov = pcov * s_sq else: pcov = inf
where func can be both "_general_function" and "_weighted_general_function", is not correct.
*M* = *σ*2*I ok unit weights M = W^(-1) your case, W has the estimates of the error covariance M = **σ*2* **W^(-1) I think this is what curve_fit uses, and *what is in (my) textbooks defined as weighted least squares Do we use definition 2 or 3 by default? both are reasonable 3 is what I expected when I looked at curve_fit 2 might be more useful for two stage estimation, but I didn't have time to check the details
Since we just had this discussion, I'm not arguing again. I just want to have clear definitions that the "average" user can use by default. I don't really care which it is if the majority of users are engineers who can tell what their errors variances are before doing any estimation.
Oh, interesting. Where did you have this discussion? On this list? I could not find it...
Sorry, I needed to find it http://groups.google.com/group/scipy-user/browse_thread/thread/55497657b2f11...
Here the problem is not to decide an 'average' behaviour, but to correctly calculate the parameters' error when the user does or does not provide the errors in the data.
correctness depends on the definitions, and definitions should be made clearer in the documentation, but it looks like the default interpretation (for users that don't read the small print) will depend on the background. Josef
Hope this helps Gianfranco _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
where func can be both "_general_function" and "_weighted_general_function", is not correct.
M = σ 2 I ok unit weights
M = W^(-1) your case, W has the estimates of the error covariance
M = σ 2 W^(-1) I think this is what curve_fit uses, and what is in (my) textbooks defined as weighted least squares
Do we use definition 2 or 3 by default? both are reasonable
3 is what I expected when I looked at curve_fit 2 might be more useful for two stage estimation, but I didn't have time to check the details
Ehmm, no, curve_fit does not use def 3, as the errors would scale with W, but they don't. By the way, it does not have the correct units. Curve_fit calculates M = W \sigma^2 W^(-1) = \sigma^2 so it gives exactly the same results of case 1, irrespective the W's. This is why the errors do not scale with W. Gianfranco
On Tue, Sep 27, 2011 at 5:17 AM, Gianfranco Durin <g.durin@inrim.it> wrote:
where func can be both "_general_function" and "_weighted_general_function", is not correct.
M = σ 2 I ok unit weights
M = W^(-1) your case, W has the estimates of the error covariance
M = σ 2 W^(-1) I think this is what curve_fit uses, and what is in (my) textbooks defined as weighted least squares
Do we use definition 2 or 3 by default? both are reasonable
3 is what I expected when I looked at curve_fit 2 might be more useful for two stage estimation, but I didn't have time to check the details
Ehmm, no, curve_fit does not use def 3, as the errors would scale with W, but they don't. By the way, it does not have the correct units.
http://wwwasdoc.web.cern.ch/wwwasdoc/minuit/node31.html ''' The minimization of [image: $ \chi^{2}_{}$] above is sometimes called *weighted least squares* in which case the inverse quantities 1/*e*2 are called the weights. Clearly this is simply a different word for the same thing, but in practice the use of these words sometimes means that the interpretation of * e*2 as variances or squared errors is not straightforward. The word weight often implies that only the relative weights are known (``point two is twice as important as point one'') in which case there is apparently an unknown overall normalization factor. Unfortunately the parameter errors coming out of such a fit will be proportional to this factor, and the user must be aware of this in the formulation of his problem. ''' (I don't quite understand the last sentence.) M = σ^2 W^(-1), where σ^2 is estimated by residual sum of squares from weighted regression. W only specifies relative errors, the assumption is that the covariance matrix of the errors is *proportional* to W. The scaling is arbitrary. If the scale of W changes, then the estimated residual sum of squares from weighted regression will compensate for it. So, rescaling W has no effect on the covariance of the parameter estimates. I checked in Greene: Econometric Analysis, and briefly looked at the SAS description. It looks like weighted least squares is always with automatic scaling, W is defined only as relative weights. All I seem to be able to find is weighted least squares with automatic scaling (except for maybe some two-step estimators).
Curve_fit calculates
M = W \sigma^2 W^(-1) = \sigma^2
If I remember correctly (calculated from the transformed model) it should be: the cov of the parameter estimates is s^2 (X'WX)^(-1) error estimates should be s^2 * W where W = diag(1/curvefit_sigma**2) unfortunate terminology for curve_fit's sigma or intentional ? (as I mentioned in the other thread) Josef
so it gives exactly the same results of case 1, irrespective the W's. This is why the errors do not scale with W.
Gianfranco _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Tue, Sep 27, 2011 at 9:20 AM, <josef.pktd@gmail.com> wrote:
On Tue, Sep 27, 2011 at 5:17 AM, Gianfranco Durin <g.durin@inrim.it>wrote:
where func can be both "_general_function" and "_weighted_general_function", is not correct.
M = σ 2 I ok unit weights
M = W^(-1) your case, W has the estimates of the error covariance
M = σ 2 W^(-1) I think this is what curve_fit uses, and what is in (my) textbooks defined as weighted least squares
Do we use definition 2 or 3 by default? both are reasonable
3 is what I expected when I looked at curve_fit 2 might be more useful for two stage estimation, but I didn't have time to check the details
Ehmm, no, curve_fit does not use def 3, as the errors would scale with W, but they don't. By the way, it does not have the correct units.
http://wwwasdoc.web.cern.ch/wwwasdoc/minuit/node31.html
''' The minimization of [image: $ \chi^{2}_{}$] above is sometimes called *weighted least squares* in which case the inverse quantities 1/*e*2 are called the weights. Clearly this is simply a different word for the same thing, but in practice the use of these words sometimes means that the interpretation of *e*2 as variances or squared errors is not straightforward. The word weight often implies that only the relative weights are known (``point two is twice as important as point one'') in which case there is apparently an unknown overall normalization factor. Unfortunately the parameter errors coming out of such a fit will be proportional to this factor, and the user must be aware of this in the formulation of his problem. ''' (I don't quite understand the last sentence.)
M = σ^2 W^(-1), where σ^2 is estimated by residual sum of squares from weighted regression.
W only specifies relative errors, the assumption is that the covariance matrix of the errors is *proportional* to W. The scaling is arbitrary. If the scale of W changes, then the estimated residual sum of squares from weighted regression will compensate for it. So, rescaling W has no effect on the covariance of the parameter estimates.
I checked in Greene: Econometric Analysis, and briefly looked at the SAS description. It looks like weighted least squares is always with automatic scaling, W is defined only as relative weights.
All I seem to be able to find is weighted least squares with automatic scaling (except for maybe some two-step estimators).
Curve_fit calculates
M = W \sigma^2 W^(-1) = \sigma^2
If I remember correctly (calculated from the transformed model) it should be:
the cov of the parameter estimates is s^2 (X'WX)^(-1) error estimates should be s^2 * W
where W = diag(1/curvefit_sigma**2) unfortunate terminology for curve_fit's sigma or intentional ? (as I mentioned in the other thread)
Josef
so it gives exactly the same results of case 1, irrespective the W's. This is why the errors do not scale with W.
Gianfranco _______________________________________________
Gianfranco, Can you please provide some Python and R (or SAS) code to show what you mean? In the linear example below, I get agreement between SAS and curve_fit with and without defining a weight. I do know that the immediate result from leastsq does not agree with values from SAS. Thus, with my limited knowledge, I consider that the documentation is correct. import numpy as np from scipy.optimize import curve_fit from scipy.optimize import leastsq x=np.array([200, 400, 300, 400, 200, 300, 300, 400, 200, 400, 200, 300], dtype=float) y=np.array([28, 75, 37, 53, 22, 58, 40, 96, 34, 52, 30, 69], dtype=float) w=np.array([0.001168822, 0.000205547, 0.000408122, 0.000205547, 0.001168822, 0.000408122, 0.000408122, 0.000205547, 0.001168822, 0.000205547, 0.001168822, 0.000408122]) sw=1/np.sqrt(w) def linfunc(x, a, b): return a + b*x popt, pcov = curve_fit(linfunc, x, y) wopt, wcov = curve_fit(linfunc, x, y, sigma=sw) Output: No weighted form usage of curve_fit which matches SAS: a= -11.25 with SE= 15.88585587 b=0.2025 with SE=0.05109428 Weighted form if I understand the difference between how weights need to be specified so the results match SAS: a= -12.91438 with SE= 11.09325 b=0.20831 with SE= 0.04342 Bruce
Hello everybody, I just saw that there is some discussion about leastsq, and guessed that this is a good time to add some ideas. I have been working on an all-python fitter, which is just the minpack fitter translated to python. This is why I was bugging yall for the qr_multiply stuff, for those who remember. Having the fitter in python is quite useful, because this calling python from FORTRAN sometimes was ugly: most importantly because exceptions often didn't find their way through FORTRAN. I added some new stuff as well: if the fitted function realizes that it's parameters are running away, it may raise an InvalidParameter exception, and the fitting algorithm will choose a smaller step size on this parameter. You can find the code at https://github.com/scipy/scipy/pull/88 Greetings Martin
<snip>
The minimization of $ \chi^{2}_{}$above is sometimes called weighted least squares in which case the inverse quantities 1/ e 2 are called the weights. Clearly this is simply a different word for the same thing, but in practice the use of these words sometimes means that the interpretation of e 2 as variances or squared errors is not straightforward. The word weight often implies that only the relative weights are known (``point two is twice as important as point one'') in which case there is apparently an unknown overall normalization factor. Unfortunately the parameter errors coming out of such a fit will be proportional to this factor, and the user must be aware of this in the formulation of his problem. ''' (I don't quite understand the last sentence.)
M = σ^2 W^(-1), where σ^2 is estimated by residual sum of squares from weighted regression.
W only specifies relative errors, the assumption is that the covariance matrix of the errors is *proportional* to W. The scaling is arbitrary. If the scale of W changes, then the estimated residual sum of squares from weighted regression will compensate for it. So, rescaling W has no effect on the covariance of the parameter estimates.
I checked in Greene: Econometric Analysis, and briefly looked at the SAS description. It looks like weighted least squares is always with automatic scaling, W is defined only as relative weights.
All I seem to be able to find is weighted least squares with automatic scaling (except for maybe some two-step estimators).
Curve_fit calculates
M = W \sigma^2 W^(-1) = \sigma^2
If I remember correctly (calculated from the transformed model) it should be:
the cov of the parameter estimates is s^2 (X'WX)^(-1) error estimates should be s^2 * W
where W = diag(1/curvefit_sigma**2) unfortunate terminology for curve_fit's sigma or intentional ? (as I mentioned in the other thread)
Josef
_______________________________________________
Gianfranco, Can you please provide some Python and R (or SAS) code to show what you mean? ... Bruce
Bruce, Josef and the others, after reading a few books, searching over many website, and tried many different software packages (and, with the help of many colleagues), I came to a conclusion which should fix the question: Weights and data errors (or uncertainties) CAN be different concepts, as written above. It depends on the user... In particular: 1. If he/she thinks the sigma JUST as weights and NOT as standard-deviation of ydata, cov_x MUST be multiplied by the residual standard deviation 2. If the user thinks the sigma as standard-deviation of ydata (i.e. measurement errors), which are by the way ALSO good to weight the data themself, then cov_x DOES NOT NEED to be multiplied by the residual standard deviation. I tried a few packages, and found they assume one of the two options by default without 'asking' (or making aware of) the user. In particular (check using the very same data and errors): Option 1: SAS, R, gnuplot, octave... Option 2: Profit, Origin, ... And mathematica? In the HOWTO: Fit Models with Measurement Errors (see below), mathematica makes the difference between weights and measurement errors, so the user can decide how to use his/her sigma. I think we should make this distinction explicit also in our curve_fit, and report it on the leastsq doc. Gianfranco ============================================================ from Mathematica: "Particularly in the physical sciences,it is common to use measurement errors as weights to incorporate measured variation into the fitting. Weights have a relative effect on the parameter estimates, but an error variance still needs to be estimated in weighted regression, and this impacts error estimates for results." 1. When using Weights alone, the variance scale is estimated using the default method [i.e. our s_sq]. Error estimates will depend on both the weights and the estimated variance scale. However, if the weights are from measurement errors, you would want error estimates to depend solely on the weights. It is important to note that weights do not change the fitting or error estimates. For example, multiplying all weights by a constant increases the estimated variance,but does not change the parameter estimates or standard errors. (Ps. This is what I meant saying that the parameters' errors .... 2. For measurements errors, you want standard errors to be computed only from the weights.... While weights have an impact on parameter estimates, the variance estimate itself does not."
On 10/03/2011 10:06 AM, Gianfranco Durin wrote:
<snip>
The minimization of $ \chi^{2}_{}$above is sometimes called weighted least squares in which case the inverse quantities 1/ e 2 are called the weights. Clearly this is simply a different word for the same thing, but in practice the use of these words sometimes means that the interpretation of e 2 as variances or squared errors is not straightforward. The word weight often implies that only the relative weights are known (``point two is twice as important as point one'') in which case there is apparently an unknown overall normalization factor. Unfortunately the parameter errors coming out of such a fit will be proportional to this factor, and the user must be aware of this in the formulation of his problem. ''' (I don't quite understand the last sentence.)
M = σ^2 W^(-1), where σ^2 is estimated by residual sum of squares from weighted regression.
W only specifies relative errors, the assumption is that the covariance matrix of the errors is *proportional* to W. The scaling is arbitrary. If the scale of W changes, then the estimated residual sum of squares from weighted regression will compensate for it. So, rescaling W has no effect on the covariance of the parameter estimates.
I checked in Greene: Econometric Analysis, and briefly looked at the SAS description. It looks like weighted least squares is always with automatic scaling, W is defined only as relative weights.
All I seem to be able to find is weighted least squares with automatic scaling (except for maybe some two-step estimators).
Curve_fit calculates
M = W \sigma^2 W^(-1) = \sigma^2
If I remember correctly (calculated from the transformed model) it should be:
the cov of the parameter estimates is s^2 (X'WX)^(-1) error estimates should be s^2 * W
where W = diag(1/curvefit_sigma**2) unfortunate terminology for curve_fit's sigma or intentional ? (as I mentioned in the other thread)
Josef
_______________________________________________
Gianfranco, Can you please provide some Python and R (or SAS) code to show what you mean? ... Bruce Bruce, Josef and the others, after reading a few books, searching over many website, and tried many different software packages (and, with the help of many colleagues), I came to a conclusion which should fix the question:
Weights and data errors (or uncertainties) CAN be different concepts, as written above. It depends on the user... In particular: 1. If he/she thinks the sigma JUST as weights and NOT as standard-deviation of ydata, cov_x MUST be multiplied by the residual standard deviation 2. If the user thinks the sigma as standard-deviation of ydata (i.e. measurement errors), which are by the way ALSO good to weight the data themself, then cov_x DOES NOT NEED to be multiplied by the residual standard deviation. It is very simple to know if the weights include 'sigma' because the estimated sigma must be one. So it should not any difference but if it does then there is probably a bigger issue than that! I tried a few packages, and found they assume one of the two options by default without 'asking' (or making aware of) the user. In particular (check using the very same data and errors): Option 1: SAS, R, gnuplot, octave... Option 2: Profit, Origin, ...
And mathematica? In the HOWTO: Fit Models with Measurement Errors (see below), mathematica makes the difference between weights and measurement errors, so the user can decide how to use his/her sigma.
I think we should make this distinction explicit also in our curve_fit, and report it on the leastsq doc.
Gianfranco Some packages (R and SAS) allow fit this by fixing certain parameters or use equivalent models to avoid estimating the those parameters. I am not knowledgeable to know if you can do that with the underlying leastsq function perhaps with some equivalent model and function.
============================================================ from Mathematica: "Particularly in the physical sciences,it is common to use measurement errors as weights to incorporate measured variation into the fitting. Weights have a relative effect on the parameter estimates, but an error variance still needs to be estimated in weighted regression, and this impacts error estimates for results."
1. When using Weights alone, the variance scale is estimated using the default method [i.e. our s_sq]. Error estimates will depend on both the weights and the estimated variance scale. However, if the weights are from measurement errors, you would want error estimates to depend solely on the weights. It is important to note that weights do not change the fitting or error estimates. For example, multiplying all weights by a constant increases the estimated variance,but does not change the parameter estimates or standard errors. (Ps. This is what I meant saying that the parameters' errors .... (That is, well, obvious!) 2. For measurements errors, you want standard errors to be computed only from the weights.... While weights have an impact on parameter estimates, the variance estimate itself does not." _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev That is because the weight are imposing a known variance-covariance structure on the data but still assuming the errors are identically and independently distributed.
Bruce
participants (5)
-
Bruce Southey
-
Gianfranco Durin
-
josef.pktd@gmail.com
-
Martin Teichmann
-
Paul Kuin