Question about errors (uncertainties) in non-linear least squares fitting
Hi, I'm fitting some data using a wrapper around the scipy.optimize.leastsq method which can be found under http://code.google.com/p/nmrglue/source/browse/trunk/nmrglue/analysis/leasts... it allows for putting bounds on the fitted parameters, which is very important for me. I'm using the covariance matrix, returned by leastsq() function to estimate the errors of the fitted parameters. The fitting is done using real measurement uncertainties (which are ridiculously small, by the way), so I would expect the resulting parameter error to be reasonable. What don't understand, is that I'm getting extremely small errors on the fitted parameters (I calculate the errors as perr = sqrt(diag(fitres[1])), where fitres[1] is the covariance matrix returned by leastsq() function). For example, a parameter which has a fitted value of ~100 gets an error of ~1e-6. At the same time, when I calculate the reduced chi squared of the fit I'm getting an extremely large number (of the order of 1e8). I can understand the large chi^2 value - the data variance is extremely small and the model curve is not perfect, so even slight deviations of the fitted model from the data will blow up chi^2 into space. But how can the fitted parameter variance be so small, while at the same time the fit is garbage according to chi^2? I guess this requires a better understanding of how the covariance matrix is calculated. Some suggestions anyone? Cheers, Paweł
Pawel, First off you may want to use a more up to date version of leastsqbound which can be found at https://github.com/jjhelmus/leastsqbound-scipy Second, when you perform a constrained optimization using internal parameters like leastsqbound does, if one of more of the parameters is close to a bound, the values in the covariance matrix can take on meaningless values. Section 1.3 of the The Minuit User's Guide [1] gives a good overview of this, especially look at the discussion on page 5. For best results an unconstrained optimization should be performed, often times you can rewrite your model in such a way that the constraints are automatically imposed (this is what is done internally in leastsqbound, but transforming back to the original model can introduce large errors if a parameter is close to the bounds). Third, since you have measurement uncertainties make sure you include them in the chi^2 calculation. I find the discussion by P.H. Richter [2] to be quite good. Cheers, - Jonathan Helmus [1] http://seal.cern.ch/documents/minuit/mnusersguide.pdf [2] Estimating Errors in Least-Squares Fitting, P.H. Richter TDA Progress Report 42-122 http://tmo.jpl.nasa.gov/progress_report/42-122/122E.pdf On 08/07/2012 09:16 AM, Pawe? Kwas'niewski wrote:
Hi,
I'm fitting some data using a wrapper around the scipy.optimize.leastsq method which can be found under http://code.google.com/p/nmrglue/source/browse/trunk/nmrglue/analysis/leasts... Basically it allows for putting bounds on the fitted parameters, which is very important for me.
I'm using the covariance matrix, returned by leastsq() function to estimate the errors of the fitted parameters. The fitting is done using real measurement uncertainties (which are ridiculously small, by the way), so I would expect the resulting parameter error to be reasonable. What don't understand, is that I'm getting extremely small errors on the fitted parameters (I calculate the errors as perr = sqrt(diag(fitres[1])), where fitres[1] is the covariance matrix returned by leastsq() function). For example, a parameter which has a fitted value of ~100 gets an error of ~1e-6. At the same time, when I calculate the reduced chi squared of the fit I'm getting an extremely large number (of the order of 1e8). I can understand the large chi^2 value - the data variance is extremely small and the model curve is not perfect, so even slight deviations of the fitted model from the data will blow up chi^2 into space. But how can the fitted parameter variance be so small, while at the same time the fit is garbage according to chi^2?
I guess this requires a better understanding of how the covariance matrix is calculated. Some suggestions anyone?
Cheers,
Pawe?
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Dear Jonathan, Thank you for the quick answer and pointing me to the updated code and the literature. I'll certainly have a look into it. Just for the record, I wanted to use constrained fitting because the unconstrained fit sometimes gave me unphysical parameters. It's interesting what you wrote about how the constraint influences the parameter error estimate. Just to give it a shot I tried an unconstrained fit on one data set. It did converge, but still - I got enormous chi^2 (reduced, and yes - I'm using the experimental errors to calculate it) with ridiculously small parameter errors... I guess I need to understand better what's going on. By the way, the constrained fitting function is great, but I'd like to add another feature - the possibility to fix a parameter. This would largely facilitate testing of the fitting function, or can even be useful in real applications, where one of the parameters is known from a previous fit of a slightly different model or a different measurement or whatever... Anyway - maybe you have any ideas how to implement that? I already tried setting the upper and lower bound to the same value - it doesn't work. The method you use to apply the bounds throws division by zero error. ** Cheers, Paweł 2012/8/7 Jonathan Helmus <jjhelmus@gmail.com>
Pawel,
First off you may want to use a more up to date version of leastsqbound which can be found at https://github.com/jjhelmus/leastsqbound-scipy
Second, when you perform a constrained optimization using internal parameters like leastsqbound does, if one of more of the parameters is close to a bound, the values in the covariance matrix can take on meaningless values. Section 1.3 of the The Minuit User's Guide [1] gives a good overview of this, especially look at the discussion on page 5. For best results an unconstrained optimization should be performed, often times you can rewrite your model in such a way that the constraints are automatically imposed (this is what is done internally in leastsqbound, but transforming back to the original model can introduce large errors if a parameter is close to the bounds).
Third, since you have measurement uncertainties make sure you include them in the chi^2 calculation. I find the discussion by P.H. Richter [2] to be quite good.
Cheers,
- Jonathan Helmus
[1] http://seal.cern.ch/documents/minuit/mnusersguide.pdf [2] Estimating Errors in Least-Squares Fitting, P.H. Richter TDA Progress Report 42-122 http://tmo.jpl.nasa.gov/progress_report/42-122/122E.pdf
On 08/07/2012 09:16 AM, Paweł Kwaśniewski wrote:
Hi,
I'm fitting some data using a wrapper around the scipy.optimize.leastsq method which can be found under http://code.google.com/p/nmrglue/source/browse/trunk/nmrglue/analysis/leasts... it allows for putting bounds on the fitted parameters, which is very important for me.
I'm using the covariance matrix, returned by leastsq() function to estimate the errors of the fitted parameters. The fitting is done using real measurement uncertainties (which are ridiculously small, by the way), so I would expect the resulting parameter error to be reasonable. What don't understand, is that I'm getting extremely small errors on the fitted parameters (I calculate the errors as perr = sqrt(diag(fitres[1])), where fitres[1] is the covariance matrix returned by leastsq() function). For example, a parameter which has a fitted value of ~100 gets an error of ~1e-6. At the same time, when I calculate the reduced chi squared of the fit I'm getting an extremely large number (of the order of 1e8). I can understand the large chi^2 value - the data variance is extremely small and the model curve is not perfect, so even slight deviations of the fitted model from the data will blow up chi^2 into space. But how can the fitted parameter variance be so small, while at the same time the fit is garbage according to chi^2?
I guess this requires a better understanding of how the covariance matrix is calculated. Some suggestions anyone?
Cheers,
Paweł
_______________________________________________ SciPy-User mailing listSciPy-User@scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Pawell, I do some fitting with leastq and fix parameters by just giving them as part of the 'args' keyword. I would assume you could do the same. On Aug 7, 2012 4:32 PM, "Paweł Kwaśniewski" <pawel.kw@gmail.com> wrote:
Dear Jonathan,
Thank you for the quick answer and pointing me to the updated code and the literature. I'll certainly have a look into it. Just for the record, I wanted to use constrained fitting because the unconstrained fit sometimes gave me unphysical parameters.
It's interesting what you wrote about how the constraint influences the parameter error estimate. Just to give it a shot I tried an unconstrained fit on one data set. It did converge, but still - I got enormous chi^2 (reduced, and yes - I'm using the experimental errors to calculate it) with ridiculously small parameter errors... I guess I need to understand better what's going on.
By the way, the constrained fitting function is great, but I'd like to add another feature - the possibility to fix a parameter. This would largely facilitate testing of the fitting function, or can even be useful in real applications, where one of the parameters is known from a previous fit of a slightly different model or a different measurement or whatever... Anyway - maybe you have any ideas how to implement that? I already tried setting the upper and lower bound to the same value - it doesn't work. The method you use to apply the bounds throws division by zero error. **
Cheers,
Paweł
2012/8/7 Jonathan Helmus <jjhelmus@gmail.com>
Pawel,
First off you may want to use a more up to date version of leastsqbound which can be found at https://github.com/jjhelmus/leastsqbound-scipy
Second, when you perform a constrained optimization using internal parameters like leastsqbound does, if one of more of the parameters is close to a bound, the values in the covariance matrix can take on meaningless values. Section 1.3 of the The Minuit User's Guide [1] gives a good overview of this, especially look at the discussion on page 5. For best results an unconstrained optimization should be performed, often times you can rewrite your model in such a way that the constraints are automatically imposed (this is what is done internally in leastsqbound, but transforming back to the original model can introduce large errors if a parameter is close to the bounds).
Third, since you have measurement uncertainties make sure you include them in the chi^2 calculation. I find the discussion by P.H. Richter [2] to be quite good.
Cheers,
- Jonathan Helmus
[1] http://seal.cern.ch/documents/minuit/mnusersguide.pdf [2] Estimating Errors in Least-Squares Fitting, P.H. Richter TDA Progress Report 42-122 http://tmo.jpl.nasa.gov/progress_report/42-122/122E.pdf
On 08/07/2012 09:16 AM, Paweł Kwaśniewski wrote:
Hi,
I'm fitting some data using a wrapper around the scipy.optimize.leastsq method which can be found under http://code.google.com/p/nmrglue/source/browse/trunk/nmrglue/analysis/leasts... it allows for putting bounds on the fitted parameters, which is very important for me.
I'm using the covariance matrix, returned by leastsq() function to estimate the errors of the fitted parameters. The fitting is done using real measurement uncertainties (which are ridiculously small, by the way), so I would expect the resulting parameter error to be reasonable. What don't understand, is that I'm getting extremely small errors on the fitted parameters (I calculate the errors as perr = sqrt(diag(fitres[1])), where fitres[1] is the covariance matrix returned by leastsq() function). For example, a parameter which has a fitted value of ~100 gets an error of ~1e-6. At the same time, when I calculate the reduced chi squared of the fit I'm getting an extremely large number (of the order of 1e8). I can understand the large chi^2 value - the data variance is extremely small and the model curve is not perfect, so even slight deviations of the fitted model from the data will blow up chi^2 into space. But how can the fitted parameter variance be so small, while at the same time the fit is garbage according to chi^2?
I guess this requires a better understanding of how the covariance matrix is calculated. Some suggestions anyone?
Cheers,
Paweł
_______________________________________________ SciPy-User mailing listSciPy-User@scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Pawel, leastsqbound cannot fix a parameter. If you want a fixed parameter in the fit you can either rewrite your fitting/error function to take the fixed parameters as an extra argument, or use a more "full featured" constrained least squares fitting package like lmfit (http://newville.github.com/lmfit-py/) whose class based Parameters allow for additional control. For any aspiring developers out there, fixed parameter could be added to leastsqbound (although it wouldn't be easy) but I'm not planning on add it in the near term. I would accept a patch which added this. - Jonathan Helmus On 08/07/2012 05:31 PM, Pawe? Kwas'niewski wrote:
Dear Jonathan,
Thank you for the quick answer and pointing me to the updated code and the literature. I'll certainly have a look into it. Just for the record, I wanted to use constrained fitting because the unconstrained fit sometimes gave me unphysical parameters.
It's interesting what you wrote about how the constraint influences the parameter error estimate. Just to give it a shot I tried an unconstrained fit on one data set. It did converge, but still - I got enormous chi^2 (reduced, and yes - I'm using the experimental errors to calculate it) with ridiculously small parameter errors... I guess I need to understand better what's going on.
By the way, the constrained fitting function is great, but I'd like to add another feature - the possibility to fix a parameter. This would largely facilitate testing of the fitting function, or can even be useful in real applications, where one of the parameters is known from a previous fit of a slightly different model or a different measurement or whatever... Anyway - maybe you have any ideas how to implement that? I already tried setting the upper and lower bound to the same value - it doesn't work. The method you use to apply the bounds throws division by zero error.
Cheers,
Pawe?
2012/8/7 Jonathan Helmus <jjhelmus@gmail.com <mailto:jjhelmus@gmail.com>>
Pawel,
First off you may want to use a more up to date version of leastsqbound which can be found at https://github.com/jjhelmus/leastsqbound-scipy
Second, when you perform a constrained optimization using internal parameters like leastsqbound does, if one of more of the parameters is close to a bound, the values in the covariance matrix can take on meaningless values. Section 1.3 of the The Minuit User's Guide [1] gives a good overview of this, especially look at the discussion on page 5. For best results an unconstrained optimization should be performed, often times you can rewrite your model in such a way that the constraints are automatically imposed (this is what is done internally in leastsqbound, but transforming back to the original model can introduce large errors if a parameter is close to the bounds).
Third, since you have measurement uncertainties make sure you include them in the chi^2 calculation. I find the discussion by P.H. Richter [2] to be quite good.
Cheers,
- Jonathan Helmus
[1] http://seal.cern.ch/documents/minuit/mnusersguide.pdf [2] Estimating Errors in Least-Squares Fitting, P.H. Richter TDA Progress Report 42-122 http://tmo.jpl.nasa.gov/progress_report/42-122/122E.pdf
On 08/07/2012 09:16 AM, Pawe? Kwas'niewski wrote:
Hi,
I'm fitting some data using a wrapper around the scipy.optimize.leastsq method which can be found under http://code.google.com/p/nmrglue/source/browse/trunk/nmrglue/analysis/leasts... Basically it allows for putting bounds on the fitted parameters, which is very important for me.
I'm using the covariance matrix, returned by leastsq() function to estimate the errors of the fitted parameters. The fitting is done using real measurement uncertainties (which are ridiculously small, by the way), so I would expect the resulting parameter error to be reasonable. What don't understand, is that I'm getting extremely small errors on the fitted parameters (I calculate the errors as perr = sqrt(diag(fitres[1])), where fitres[1] is the covariance matrix returned by leastsq() function). For example, a parameter which has a fitted value of ~100 gets an error of ~1e-6. At the same time, when I calculate the reduced chi squared of the fit I'm getting an extremely large number (of the order of 1e8). I can understand the large chi^2 value - the data variance is extremely small and the model curve is not perfect, so even slight deviations of the fitted model from the data will blow up chi^2 into space. But how can the fitted parameter variance be so small, while at the same time the fit is garbage according to chi^2?
I guess this requires a better understanding of how the covariance matrix is calculated. Some suggestions anyone?
Cheers,
Pawe?
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org <mailto:SciPy-User@scipy.org> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org <mailto:SciPy-User@scipy.org> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
participants (3)
-
Jonathan Helmus -
Kevin Gullikson -
Paweł Kwaśniewski