Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit
Hi, I hope this is the appropriate place to ask something like this, otherwise please let me know (or feel free to ignore this). Also I hope that I do not misunderstood something or did some silly mistake. If so, please let me know as well! TLDR: When scaling the covariance matrix based on the residuals, scipy.optimize.curve_fit uses a factor of chisq(popt)/(MN) (with M=number of point, N=number of parameters) and numpy.polyfit uses chisq(popt)/(MN2). I am wondering which is correct. I am somewhat confused about different results I am getting for the covariance matrix of a simple linear fit, when comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I am aware, that `curve_fit` solves the more general nonlinear problem numerically, while `polyfit` finds an analytical solution to the linear problem. However, both converge to the same solution, so I suspect that this difference is not important here. The difference, I am curious about is not in the returned parameters but in the estimate of the corresponding covariance matrix. As I understand, there are two different ways to estimate it, based either on the absolute values of the provided uncertainties or by interpreting those only as weights and then scaling the matrix to produce an appropriate reduced chisq. To that end, curve_fit has the parameter: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.cur ve_fit.html: "absolute_sigma : bool, optional If True, sigma is used in an absolute sense and the estimated parameter covariance pcov reflects these absolute values. If False, only the relative magnitudes of the sigma values matter. The returned parameter covariance matrix pcov is based on scaling sigma by a constant factor. This constant is set by demanding that the reduced chisq for the optimal parameters popt when using the scaled sigma equals unity. In other words, sigma is scaled to match the sample variance of the residuals after the fit. Mathematically, pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * chisq(popt)/(MN)" https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html on the other hand, does not say anything about how the covariance matrix is estimated. To my understanding, its default should correspond to `absolute_sigma=False` for `curve_fit`. As `polyfit` has a weight parameter instead of an uncertainty parameter, I guess the difference in default behavior is not that surprising. However, even when specifying `absolute_sigma=False`, `curve_fit` and `polyfit` produce different covariance matrices as the applied scaling factors are chisq(popt)/(MN2) for `polyfit` (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2ec82 21348b/numpy/lib/polynomial.py#L598L605) and chisq(popt)/(MN) for `curve_fit` (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994137a 3ccd5b/scipy/optimize/minpack.py#L781L782). The argument given in a comment to the scaling `polyfit` is: "Some literature ignores the extra 2.0 factor in the denominator, but it is included here because the covariance of Multivariate StudentT (which is implied by a Bayesian uncertainty analysis) includes it. Plus, it gives a slightly more conservative estimate of uncertainty.", but honestly, in a quick search, I was not able to find any literature not ignoring the extra "factor". But obviously, I could very well be misunderstanding something. Nonetheless, as `curve_fit` ignores it as well, I was wondering whether those two shouldn't give consistent results and if so, which would be the correct solution. Best, Jonathan
On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert < jotasi_numpy_scipy@posteo.de> wrote:
Hi,
I hope this is the appropriate place to ask something like this, otherwise please let me know (or feel free to ignore this). Also I hope that I do not misunderstood something or did some silly mistake. If so, please let me know as well!
TLDR: When scaling the covariance matrix based on the residuals, scipy.optimize.curve_fit uses a factor of chisq(popt)/(MN) (with M=number of point, N=number of parameters) and numpy.polyfit uses chisq(popt)/(MN2). I am wondering which is correct.
I am somewhat confused about different results I am getting for the covariance matrix of a simple linear fit, when comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I am aware, that `curve_fit` solves the more general nonlinear problem numerically, while `polyfit` finds an analytical solution to the linear problem. However, both converge to the same solution, so I suspect that this difference is not important here. The difference, I am curious about is not in the returned parameters but in the estimate of the corresponding covariance matrix. As I understand, there are two different ways to estimate it, based either on the absolute values of the provided uncertainties or by interpreting those only as weights and then scaling the matrix to produce an appropriate reduced chisq. To that end, curve_fit has the parameter: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.cur ve_fit.html: "absolute_sigma : bool, optional If True, sigma is used in an absolute sense and the estimated parameter covariance pcov reflects these absolute values. If False, only the relative magnitudes of the sigma values matter. The returned parameter covariance matrix pcov is based on scaling sigma by a constant factor. This constant is set by demanding that the reduced chisq for the optimal parameters popt when using the scaled sigma equals unity. In other words, sigma is scaled to match the sample variance of the residuals after the fit. Mathematically, pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * chisq(popt)/(MN)" https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html on the other hand, does not say anything about how the covariance matrix is estimated. To my understanding, its default should correspond to `absolute_sigma=False` for `curve_fit`. As `polyfit` has a weight parameter instead of an uncertainty parameter, I guess the difference in default behavior is not that surprising. However, even when specifying `absolute_sigma=False`, `curve_fit` and `polyfit` produce different covariance matrices as the applied scaling factors are chisq(popt)/(MN2) for `polyfit` (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2ec82 21348b/numpy/lib/polynomial.py#L598L605) and chisq(popt)/(MN) for `curve_fit` (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994137a 3ccd5b/scipy/optimize/minpack.py#L781L782). The argument given in a comment to the scaling `polyfit` is: "Some literature ignores the extra 2.0 factor in the denominator, but it is included here because the covariance of Multivariate StudentT (which is implied by a Bayesian uncertainty analysis) includes it. Plus, it gives a slightly more conservative estimate of uncertainty.", but honestly, in a quick search, I was not able to find any literature not ignoring the extra "factor". But obviously, I could very well be misunderstanding something. Nonetheless, as `curve_fit` ignores it as well, I was wondering whether those two shouldn't give consistent results and if so, which would be the correct solution.
I've never seen the 2 in any literature, and there is no reference in the code comment. (I would remove it as a bugfix. Even if there is some Bayesian interpretation, it is not what users would expect.) There was a similar thread in 2013 https://mail.scipy.org/pipermail/numpydiscussion/2013February/065664.html Josef
Best,
Jonathan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Tue, 20180529 at 10:47 0400, josef.pktd@gmail.com wrote:
On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert < jotasi_numpy_scipy@posteo.de> wrote:
Hi,
I hope this is the appropriate place to ask something like this, otherwise please let me know (or feel free to ignore this). Also I hope that I do not misunderstood something or did some silly mistake. If so, please let me know as well!
TLDR: When scaling the covariance matrix based on the residuals, scipy.optimize.curve_fit uses a factor of chisq(popt)/(MN) (with M=number of point, N=number of parameters) and numpy.polyfit uses chisq(popt)/(MN2). I am wondering which is correct.
I am somewhat confused about different results I am getting for the covariance matrix of a simple linear fit, when comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I am aware, that `curve_fit` solves the more general nonlinear problem numerically, while `polyfit` finds an analytical solution to the linear problem. However, both converge to the same solution, so I suspect that this difference is not important here. The difference, I am curious about is not in the returned parameters but in the estimate of the corresponding covariance matrix. As I understand, there are two different ways to estimate it, based either on the absolute values of the provided uncertainties or by interpreting those only as weights and then scaling the matrix to produce an appropriate reduced chisq. To that end, curve_fit has the parameter: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize .cur ve_fit.html: "absolute_sigma : bool, optional If True, sigma is used in an absolute sense and the estimated parameter covariance pcov reflects these absolute values. If False, only the relative magnitudes of the sigma values matter. The returned parameter covariance matrix pcov is based on scaling sigma by a constant factor. This constant is set by demanding that the reduced chisq for the optimal parameters popt when using the scaled sigma equals unity. In other words, sigma is scaled to match the sample variance of the residuals after the fit. Mathematically, pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * chisq(popt)/(MN)" https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit. html on the other hand, does not say anything about how the covariance matrix is estimated. To my understanding, its default should correspond to `absolute_sigma=False` for `curve_fit`. As `polyfit` has a weight parameter instead of an uncertainty parameter, I guess the difference in default behavior is not that surprising. However, even when specifying `absolute_sigma=False`, `curve_fit` and `polyfit` produce different covariance matrices as the applied scaling factors are chisq(popt)/(MN2) for `polyfit` (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2 ec82 21348b/numpy/lib/polynomial.py#L598L605) and chisq(popt)/(MN) for `curve_fit` (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994 137a 3ccd5b/scipy/optimize/minpack.py#L781L782). The argument given in a comment to the scaling `polyfit` is: "Some literature ignores the extra 2.0 factor in the denominator, but it is included here because the covariance of Multivariate StudentT (which is implied by a Bayesian uncertainty analysis) includes it. Plus, it gives a slightly more conservative estimate of uncertainty.", but honestly, in a quick search, I was not able to find any literature not ignoring the extra "factor". But obviously, I could very well be misunderstanding something. Nonetheless, as `curve_fit` ignores it as well, I was wondering whether those two shouldn't give consistent results and if so, which would be the correct solution.
I've never seen the 2 in any literature, and there is no reference in the code comment. (I would remove it as a bugfix. Even if there is some Bayesian interpretation, it is not what users would expect.)
That would be my preferred fix as well. If there aren't any objections, I'll open a corresponding issue and PR.
There was a similar thread in 2013 https://mail.scipy.org/pipermail/numpydiscussion/2013 February/065664.html
Thanks for the link. I must've somehow missed that earlier discussion. Would it be appropriate to also add an additional parameter along the lines of curve_fit's `absolute_sigma` with default `False` to keep it consistent? I already felt that something like this was missing for cases where proper standard errors are known for the data points and it was apparently already discussed in 2013. As far as I can see, the main reason against that is the fact that `polyfit` accepts `w` (weights>`1/sigma`) as a parameter and not `sigma`, which would make the documentation somewhat less intuitive than in the case of `curve_fit`. Jonathan
Josef
Best,
Jonathan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Tue, May 29, 2018 at 2:21 PM, Jonathan Tammo Siebert < jotasi_numpy_scipy@posteo.de> wrote:
On Tue, 20180529 at 10:47 0400, josef.pktd@gmail.com wrote:
On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert < jotasi_numpy_scipy@posteo.de> wrote:
Hi,
I hope this is the appropriate place to ask something like this, otherwise please let me know (or feel free to ignore this). Also I hope that I do not misunderstood something or did some silly mistake. If so, please let me know as well!
TLDR: When scaling the covariance matrix based on the residuals, scipy.optimize.curve_fit uses a factor of chisq(popt)/(MN) (with M=number of point, N=number of parameters) and numpy.polyfit uses chisq(popt)/(MN2). I am wondering which is correct.
I am somewhat confused about different results I am getting for the covariance matrix of a simple linear fit, when comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I am aware, that `curve_fit` solves the more general nonlinear problem numerically, while `polyfit` finds an analytical solution to the linear problem. However, both converge to the same solution, so I suspect that this difference is not important here. The difference, I am curious about is not in the returned parameters but in the estimate of the corresponding covariance matrix. As I understand, there are two different ways to estimate it, based either on the absolute values of the provided uncertainties or by interpreting those only as weights and then scaling the matrix to produce an appropriate reduced chisq. To that end, curve_fit has the parameter: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize .cur ve_fit.html: "absolute_sigma : bool, optional If True, sigma is used in an absolute sense and the estimated parameter covariance pcov reflects these absolute values. If False, only the relative magnitudes of the sigma values matter. The returned parameter covariance matrix pcov is based on scaling sigma by a constant factor. This constant is set by demanding that the reduced chisq for the optimal parameters popt when using the scaled sigma equals unity. In other words, sigma is scaled to match the sample variance of the residuals after the fit. Mathematically, pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * chisq(popt)/(MN)" https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit. html on the other hand, does not say anything about how the covariance matrix is estimated. To my understanding, its default should correspond to `absolute_sigma=False` for `curve_fit`. As `polyfit` has a weight parameter instead of an uncertainty parameter, I guess the difference in default behavior is not that surprising. However, even when specifying `absolute_sigma=False`, `curve_fit` and `polyfit` produce different covariance matrices as the applied scaling factors are chisq(popt)/(MN2) for `polyfit` (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2 ec82 21348b/numpy/lib/polynomial.py#L598L605) and chisq(popt)/(MN) for `curve_fit` (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994 137a 3ccd5b/scipy/optimize/minpack.py#L781L782). The argument given in a comment to the scaling `polyfit` is: "Some literature ignores the extra 2.0 factor in the denominator, but it is included here because the covariance of Multivariate StudentT (which is implied by a Bayesian uncertainty analysis) includes it. Plus, it gives a slightly more conservative estimate of uncertainty.", but honestly, in a quick search, I was not able to find any literature not ignoring the extra "factor". But obviously, I could very well be misunderstanding something. Nonetheless, as `curve_fit` ignores it as well, I was wondering whether those two shouldn't give consistent results and if so, which would be the correct solution.
I've never seen the 2 in any literature, and there is no reference in the code comment. (I would remove it as a bugfix. Even if there is some Bayesian interpretation, it is not what users would expect.)
That would be my preferred fix as well. If there aren't any objections, I'll open a corresponding issue and PR.
There was a similar thread in 2013 https://mail.scipy.org/pipermail/numpydiscussion/2013 February/065664.html
Thanks for the link. I must've somehow missed that earlier discussion. Would it be appropriate to also add an additional parameter along the lines of curve_fit's `absolute_sigma` with default `False` to keep it consistent? I already felt that something like this was missing for cases where proper standard errors are known for the data points and it was apparently already discussed in 2013. As far as I can see, the main reason against that is the fact that `polyfit` accepts `w` (weights>`1/sigma`) as a parameter and not `sigma`, which would make the documentation somewhat less intuitive than in the case of `curve_fit`.
It would work with "absolute_weights" After the long discussions for scaling in curve_fit, I think it's fine to add it. asides: I still don't really understand why users would want it for the covariance of the parameter estimates. However, I also added an option to statsmodels OLS and WLS to keep the scale fixed instead of using the estimated scale. There is a reason that polyfit might have different, larger standard errors than curve_fit, if we assume that curve_fit has a correctly specified mean function and polyfit is just a low order approximation to an arbitrary nonlinear function. That is polyfit combines a functional approximation error with the stochastic error from the random observations. (which is also ignore if scale is fixed.) (But I never tried to figure out those non or semiparametric approximation details.) aside further away: In Poisson regression, we go the opposite way and use an estimated residual scale instead of a fixed scale = 1 so we can correct the standard errors for the parameters when there is overdispersion. (I just ran a Monte Carlo example where a hypothesis test under Poisson assumption is very wrong because of the over dispersion. I.e. if the chi2 is far away from 1, then the standard errors for the parameters are "useless" if scale is assumed to be 1.) Josef
Jonathan
Josef
Best,
Jonathan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
participants (2)

Jonathan Tammo Siebert

josef.pktd＠gmail.com