[Numpy-discussion] Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit

Tue May 29 14:21:25 EDT 2018

On Tue, 2018-05-29 at 10:47 -0400, josef.pktd at gmail.com wrote:
> On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert <
> jotasi_numpy_scipy at posteo.de> wrote:
> 
> > Hi,
> > 
> > I hope this is the appropriate place to ask something like
> > this, otherwise please let me know (or feel free to ignore
> > this). Also I hope that I do not misunderstood something or
> > did some silly mistake. If so, please let me know as well!
> > 
> > TLDR:
> > When scaling the covariance matrix based on the residuals,
> > scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N)
> > (with M=number of point, N=number of parameters) and
> > numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which
> > is correct.
> > 
> > I am somewhat confused about different results I am getting
> > for the covariance matrix of a simple linear fit, when
> > comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I
> > am aware, that `curve_fit` solves the more general non-linear
> > problem numerically, while `polyfit` finds an analytical
> > solution to the linear problem. However, both converge to the
> > same solution, so I suspect that this difference is not
> > important here. The difference, I am curious about is not in
> > the returned parameters but in the estimate of the
> > corresponding covariance matrix. As I understand, there are
> > two different ways to estimate it, based either on the
> > absolute values of the provided uncertainties or by
> > interpreting those only as weights and then scaling the
> > matrix to produce an appropriate reduced chisq. To that end,
> > curve_fit has the parameter:
> > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize
> > .cur
> > ve_fit.html:
> > "absolute_sigma : bool, optional
> > If True, sigma is used in an absolute sense and the
> > estimated parameter covariance pcov reflects these absolute
> > values.
> > If False, only the relative magnitudes of the sigma values
> > matter. The returned parameter covariance matrix pcov is
> > based on scaling sigma by a constant factor. This constant
> > is set by demanding that the reduced chisq for the optimal
> > parameters popt when using the scaled sigma equals unity. In
> > other words, sigma is scaled to match the sample variance of
> > the residuals after the fit. Mathematically,
> > pcov(absolute_sigma=False) = pcov(absolute_sigma=True) *
> > chisq(popt)/(M-N)"
> > https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.
> > html
> > on the other hand, does not say anything about how the
> > covariance matrix is estimated. To my understanding, its
> > default should correspond to `absolute_sigma=False` for
> > `curve_fit`. As `polyfit` has a weight parameter instead of
> > an uncertainty parameter, I guess the difference in default
> > behavior is not that surprising.
> > However, even when specifying `absolute_sigma=False`,
> > `curve_fit` and `polyfit` produce different covariance
> > matrices as the applied scaling factors are
> > chisq(popt)/(M-N-2) for `polyfit`
> > (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2
> > ec82
> > 21348b/numpy/lib/polynomial.py#L598-L605)
> > and chisq(popt)/(M-N) for `curve_fit`
> > (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994
> > 137a
> > 3ccd5b/scipy/optimize/minpack.py#L781-L782).
> > The argument given in a comment to the scaling `polyfit` is:
> > "Some literature ignores the extra -2.0 factor in the
> > denominator, but it is included here because the covariance
> > of Multivariate Student-T (which is implied by a Bayesian
> > uncertainty analysis) includes it. Plus, it gives a slightly
> > more conservative estimate of uncertainty.",
> > but honestly, in a quick search, I was not able to find any
> > literature not ignoring the extra "factor". But obviously, I
> > could very well be misunderstanding something.
> > Nonetheless, as `curve_fit` ignores it as well, I was
> > wondering whether those two shouldn't give consistent
> > results and if so, which would be the correct solution.
> > 
> 
> 
> I've never seen the -2 in any literature, and there is no reference
> in the
> code comment.
> (I would remove it as a bug-fix. Even if there is some Bayesian
> interpretation, it is not what users would expect.)

That would be my preferred fix as well. If there aren't any objections,
I'll open a corresponding issue and PR.

> There was a similar thread in 2013
> https://mail.scipy.org/pipermail/numpy-discussion/2013-
> February/065664.html

Thanks for the link. I must've somehow missed that earlier discussion.
Would it be appropriate to also add an additional parameter along the
lines of curve_fit's `absolute_sigma` with default `False` to keep it
consistent? I already felt that something like this was missing for
cases where proper standard errors are known for the data points and it
was apparently already discussed in 2013. As far as I can see, the main
reason against that is the fact that `polyfit` accepts `w`
(weights->`1/sigma`) as a parameter and not `sigma`, which would make
the documentation somewhat less intuitive than in the case of
`curve_fit`.

Jonathan

> 
> Josef
> 
> 
> > 
> > 
> > Best,
> > 
> > Jonathan
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion