[SciPy-User] adding linear fitting routine

Wed Dec 4 08:58:28 EST 2013

Daniele

On Dec 4, 2013, at 1:55 PM, Daniele Nicolodi <daniele at grinta.net> wrote:

> On 04/12/2013 13:43, David J Pine wrote:
>> linfit does not call a matrix inversion routine.  Instead it calculates
>> the best fit slope and y-intercept directly.  By contrast, polyfit does
>> call a matrix inversion routine (numpy.linalg.lstsq), which has a
>> certain amount of overhead that linfit avoids.  This may be why polyfit
>> is slower than linfit.
> 
> A least squares fit is a matrix inversion. What you do is a matrix
> inversion, except that the notation you use does not make this clear.
> What you can discuss is the method you use for the inversion.  I would
> have to have a closer look to the test...

I assure you that I understand the mathematics.  Specifically, I understand
that you can view the mathematics used in linfit as implementing matrix
inversion.  That is not the point.  The point is that polyfit calls a matrix
inversion routine, which invokes computational machinery that is slow
compared to just doing the algebra directly without calling a matrix
inversion routine.  I hope this is clear.

> 
>> 4. relsigma.  Other than using no weighting at all, there are basically
>> two ways that people weight data in a least squares fit.
>>    (a) provide explicit absolute estimates of the errors
>> (uncertainties) for each data point.  This is what physical scientists
>> often do.  Setting relsigma=False tells linfit to use this method of
>> weighting the data.  If the error estimates are accurate, then the
>> covariance matrix provides estimates of the uncertainties in the fitting
>> parameters (the slope & y-intercept).
>>    (b) provide relative estimates of the errors (uncertainties) for
>> each data point (it's assumed that the absolute errors are not known but
>> relative uncertainties between difference data points is known).  This
>> is what social scientists often do.  When only the relative
>> uncertainties are known, the covariance matrix needs to be rescaled in
>> order to obtain accurate estimates of the uncertainties in the fitting
>> parameters.  Setting relsigma=True tells linfit to use this method of
>> weighting the data.
> 
> This is not really clear from the docstring (plus it optional but no
> default value is specified in the docstring), and made even less obvious
> by the name of the parameter used to specify the uncertainties.

It's specified in the function definition:

def linfit(x, y, sigmay=None, relsigma=True, cov=False, chisq=False, residuals=False)

which is the way it's always done in the online numpy and scipy documentation.
However, I can additionally specify it in the docstring under the parameter
definition.

> 
> I would prefer two independent and mutually exclusive parameters for the
> two cases, 'sigma' and 'relsigma' are one option if you want to be
> compatible with the (ugly, IMHO) parameter name used by curve_fit.

Here I disagree.  sigmay is an array of error values.  resigma is a boolean that simply
tells linfit whether to treat the sigmay values as relative (relsigma=True, the default)
or as absolute (relsigma=False).

David