[SciPy-User] adding linear fitting routine
Matt Newville
newville at cars.uchicago.edu
Wed Dec 4 15:15:30 EST 2013
Hi David,
On Wed, Dec 4, 2013 at 1:13 PM, David J Pine <djpine at gmail.com> wrote:
> I guess my preference would be to write have linfit() be as similar to
> curve_fit() in outputs (and inputs in so far as it makes sense), and then if
> we decide we prefer another way of doing either the inputs or the outputs,
> then to do them in concert. I think there is real value in making the user
> interfaces of linfit() and curve_fit() consistent--it make the user's
> experience so much less confusing. As of right now, I am agnostic about
> whether or not the function returns a dictionary of results--although I am
> unsure of what you have in mind. How would you structure a dictionary of
> results?
>
Using return (pbest, covar) seems reasonable. But, if you
returned a dictionary, you could include a chi-square statistic and a
residuals array.
scipy.optimize.leastsq() returns 5 items: (pbest, covar, infodict, mesg, ier)
with infodict being a dict with items 'nfev', 'fvec', 'fjac', 'ipvt',
and 'qtf'. I think it's too late to change it, but it would have
been nicer (IMHO) if it had returned a single dict instead:
return {'best_values': pbest, 'covar': covar, 'nfev':
infodict['nfev'], 'fvec': infodict['fvec'],
'fjac': infodict['fjac'], 'ipvt': infodict['ipvt'],
'qtf': infodict['qtf'], 'mesg': mesg, 'ier': ier}
Similarly, linregress() returns a 5 element tuple. The problem with
these is that you end up with long assignments
slope, intercept, r_value, p_value, stderr =
scipy.stats.linregress(xdata, ydata)
in fact, you sort of have to do this, even for a quick and dirty
result when slope and intercept are all that would be used later on.
The central problem is these 5 returned values are now in your local
namespace, but they are not really independent values. Instead, you
could think about
regression = scipy.stats.linregress(xdata, ydata)
and get to any of the values from computing the regression you want.
In short, if you
had linfit() return a dictionary of values, you could put many
statistics in it, and people who wanted to ignore some of them would
be able to do so.
FWIW, a named tuple would be fine alternative. I don't know if
backward compatibility would prevent that in scipy. Anyway, it's
just a suggestion....
--Matt
More information about the SciPy-User
mailing list