[SciPy-User] adding linear fitting routine
David Pine
djpine at gmail.com
Wed Dec 4 08:58:28 EST 2013
Daniele
On Dec 4, 2013, at 1:55 PM, Daniele Nicolodi <daniele at grinta.net> wrote:
> On 04/12/2013 13:43, David J Pine wrote:
>> linfit does not call a matrix inversion routine. Instead it calculates
>> the best fit slope and y-intercept directly. By contrast, polyfit does
>> call a matrix inversion routine (numpy.linalg.lstsq), which has a
>> certain amount of overhead that linfit avoids. This may be why polyfit
>> is slower than linfit.
>
> A least squares fit is a matrix inversion. What you do is a matrix
> inversion, except that the notation you use does not make this clear.
> What you can discuss is the method you use for the inversion. I would
> have to have a closer look to the test...
I assure you that I understand the mathematics. Specifically, I understand
that you can view the mathematics used in linfit as implementing matrix
inversion. That is not the point. The point is that polyfit calls a matrix
inversion routine, which invokes computational machinery that is slow
compared to just doing the algebra directly without calling a matrix
inversion routine. I hope this is clear.
>
>> 4. relsigma. Other than using no weighting at all, there are basically
>> two ways that people weight data in a least squares fit.
>> (a) provide explicit absolute estimates of the errors
>> (uncertainties) for each data point. This is what physical scientists
>> often do. Setting relsigma=False tells linfit to use this method of
>> weighting the data. If the error estimates are accurate, then the
>> covariance matrix provides estimates of the uncertainties in the fitting
>> parameters (the slope & y-intercept).
>> (b) provide relative estimates of the errors (uncertainties) for
>> each data point (it's assumed that the absolute errors are not known but
>> relative uncertainties between difference data points is known). This
>> is what social scientists often do. When only the relative
>> uncertainties are known, the covariance matrix needs to be rescaled in
>> order to obtain accurate estimates of the uncertainties in the fitting
>> parameters. Setting relsigma=True tells linfit to use this method of
>> weighting the data.
>
> This is not really clear from the docstring (plus it optional but no
> default value is specified in the docstring), and made even less obvious
> by the name of the parameter used to specify the uncertainties.
It's specified in the function definition:
def linfit(x, y, sigmay=None, relsigma=True, cov=False, chisq=False, residuals=False)
which is the way it's always done in the online numpy and scipy documentation.
However, I can additionally specify it in the docstring under the parameter
definition.
>
> I would prefer two independent and mutually exclusive parameters for the
> two cases, 'sigma' and 'relsigma' are one option if you want to be
> compatible with the (ugly, IMHO) parameter name used by curve_fit.
Here I disagree. sigmay is an array of error values. resigma is a boolean that simply
tells linfit whether to treat the sigmay values as relative (relsigma=True, the default)
or as absolute (relsigma=False).
David
More information about the SciPy-User
mailing list