[Numpy-discussion] polyfit in NumPy v1.7

Wed Feb 27 20:27:10 EST 2013

On Wed, Feb 27, 2013 at 3:01 PM, David Pine <djpine at gmail.com> wrote:
> Pauli, Josef, Chuck,
>
> I read over the discussion on curve_fit.  I believe I now understand what
> people are trying to do when they write about scaling the weighting and/or
> covariance matrix.  And I agree that what polyfit does in its current form
> is estimate the absolute errors in the data from the data and the fit.
>
> Unfortunately, in the case that you want to supply the absolute uncertainty
> estimates, polyfit doesn't leave the option of not making this "correction".
> This is a mistake,
>
> In my opinion, polyfit and curve_fit should be organized with the following
> arguments:

I'm fine with the structure (as long as estimating the error scale
remains the default).
I'm not a big fan of the names, but I'm not a developer or main user
of polyfit or curve_fit, so whatever...

>
> ydata_err  :  array_like or float or None

I still prefer weights or sigma because then I know what it means.

>
> absolute_err  :  bool (True or False)
>
> ------------------------
>
> If ydata_err is an array, then it has the same length as xdata and ydata and
> gives the uncertainty (absolute or relative) of ydata.

uncertainty is vague, either standard deviation or "weights" multiplying y and x

>
> If ydata_err is a float (a single value), it gives the uncertainty (absolute
> or relative) of ydata.  That is, all ydata points have the same uncertainty

fine, with the single value case, although it won't have any effect if
absolute_err is False

>
> If ydata_err is None, the ydata_err is set equal to 1 internally.
>
> If absolute_err is True, then no correction is made to the covariance matrix
>
> If absolute_err is False, the a correction is made to the covariance matrix
> based on the square of residuals and the value(s) of ydata_err so that it
> gives useful estimates of the uncertainties in the fitting parameters.
>
> Finally, I have a quibble with the use of the extra factor of 2 subtracted
> in the denominator of  fac (line 596).  This is highly non-standard.  Most
> of the literature does not use this factor, and for good reason.  It leads
> to complete nonsense when there are only 3 or 4 data points, for one thing.
> In supplying software for general use, the most common standard should be
> used.

I've never heard of the -2 correction
        592 	# Some literature ignores the extra -2.0 factor in the
denominator, but
	593 	# it is included here because the covariance of Multivariate Student-T
	594 	# (which is implied by a Bayesian uncertainty analysis) includes it.
	595 	# Plus, it gives a slightly more conservative estimate of uncertainty.
	596 	fac = resids / (len(x) - order - 2.0)

???

Josef

>
> Finally, the documentation needs to be improved so that it is clear.  I
> would be happy to contribute both to improving the documentation and
> software.
>
> David
>
> If
>
>
>
> On Wed, Feb 27, 2013 at 12:47 PM, Pauli Virtanen <pav at iki.fi> wrote:
>>
>> 27.02.2013 16:40, David Pine kirjoitti:
>> [clip]
>> > 2.  I am sorry but I don't understand your response.  The matrix Vbase
>> > in the code is already the covariance matrix, _before_ it is scaled by
>> > fac.  Scaling it by fac and returning  Vbase*fac as the covariance
>> > matrix is wrong, at least according to the references I know, including
>> > "Numerical Recipes", by Press et al, "Data Reduction and Error Analysis
>> > for the Physical Sciences" by Bevington, both standard works.
>>
>> The covariance matrix is (A^T A)^-1 only if the data is weighed by its
>> standard errors prior to lstsq. Polyfit estimates the standard errors
>> from the fit itself, which results in the `fac` multiplication.
>>
>> This is apparently what some people expect. The way the weight
>> parameters work is however confusing, as they are
>> w[i]=sigma_estimate/sigma[i], rather than being absolute errors.
>>
>> Anyway, as Josef noted, it's the same problem that curve_fit in Scipy
>> had and probably the same fix needs to be done here.
>>
>> --
>> Pauli Virtanen
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>