[SciPy-User] adding linear fitting routine

Matt Newville newville at cars.uchicago.edu
Wed Dec 4 15:15:30 EST 2013


Hi David,

On Wed, Dec 4, 2013 at 1:13 PM, David J Pine <djpine at gmail.com> wrote:
> I guess my preference would be to write have linfit() be as similar to
> curve_fit() in outputs (and inputs in so far as it makes sense), and then if
> we decide we prefer another way of doing either the inputs or the outputs,
> then to do them in concert.  I think there is real value in making the user
> interfaces of linfit() and curve_fit() consistent--it make the user's
> experience so much less confusing. As of right now, I am agnostic about
> whether or not the function returns a dictionary of results--although I am
> unsure of what you have in mind.  How would you structure a dictionary of
> results?
>

Using    return (pbest, covar)  seems reasonable.  But, if you
returned a dictionary, you could include a chi-square statistic and a
residuals array.

scipy.optimize.leastsq() returns 5 items: (pbest, covar, infodict, mesg, ier)
with infodict being a dict with items 'nfev', 'fvec', 'fjac', 'ipvt',
and 'qtf'.   I think it's too late to change it, but it would have
been nicer (IMHO) if it had returned a single dict instead:

   return {'best_values': pbest, 'covar': covar, 'nfev':
infodict['nfev'], 'fvec': infodict['fvec'],
             'fjac': infodict['fjac'], 'ipvt': infodict['ipvt'],
'qtf': infodict['qtf'],  'mesg': mesg,  'ier': ier}

Similarly, linregress() returns a 5 element tuple.  The problem with
these is that you end up with long assignments
    slope, intercept, r_value, p_value, stderr =
scipy.stats.linregress(xdata, ydata)

in fact, you sort of have to do this, even for a quick and dirty
result when slope and intercept are all that would be used later on.
The central problem is these 5 returned values are now in your local
namespace, but they are not really independent values.  Instead, you
could think about
     regression = scipy.stats.linregress(xdata, ydata)

and get to any of the values from computing the regression you want.
 In short, if you
had linfit() return a dictionary of values, you could put many
statistics in it, and people who wanted to ignore some of them would
be able to do so.

FWIW, a named tuple would be fine alternative.  I don't know if
backward compatibility would prevent that in scipy.   Anyway, it's
just a suggestion....

--Matt



More information about the SciPy-User mailing list