[SciPy-User] How to fit data with errorbars

Wed Feb 17 17:25:47 EST 2010

On Wed, 17 Feb 2010 10:04:11 -0800, Nathaniel Smith wrote:

> On Wed, Feb 17, 2010 at 9:12 AM, Joe Harrington <jh at physics.ucf.edu> wrote:,
> > The routine below does the job.  I was surprised not to find this
> > routine in numpy or scipy as it's very standard (I'm still convinced I
> > must have missed it, so somebody please embarrass me and point it
> > out).
> 
> Err, it may be very standard, but what is it? :-) It doesn't jump out
> at me as being a least-squares fitter (or if it is then it's
> implemented in a strange way), and I don't have a copy of Numerical
> Recipes handy. (Nor, honestly, does NR have a great reputation when it
> comes to statistical stuff, AFAICT.)

You can get the 2nd ed. of NR (cited in the routine) free here:

http://www.nrbook.com/a/bookcpdf.php

NR is like PERL: the wrong solution for every (well, many) job(s).  It
has a basic introduction to a large number of topics that is
accessible to a general reader.  It is highly available, and older
editions are free.  Specialists in almost every discipline complain
about it, and not without justification.  Of course, they would have
you read several semesters' worth of background material before you
even attempted your first plot in their particular areas, which
generally snuffs out all interest in the beginner.  So, read NR and
get the hang of something, play with things a bit, but then DO read
some specialized sources before you do things "for real".  If you
happen to have a good intro book in a specialized area, of course,
skip NR entirely.

Regarding the routine:

There is an analytic solution to the least-squares line fit (assuming
Gaussian errors, independent measurements, etc. etc.), which is taught
in most calculus curricula.  The routine here is a variant on that
one.

For the OP:

Some more-complicated functions also have analyic solutions, but this
is not generally true of all functions.  Function minimizers
numerically explore the chi-squared surface in the phase space of the
function parameters looking for the minimum, but it is not generally
possible to determine whether the minimum they find is the global
minimum or just a local one.  Some routines are better than others,
faster than others, more easily fooled than others, etc.

Also, determining the errors on the fitted function parameters usually
requires a Monte Carlo approach; if there is any correlation among
parameters, you cannot use the diagonal elements of the covariance
matrix that most minimizers report as errors.  Markov-chain Monte
Carlo routines are popular for making such estimates, but they can be
computationally costly, depending on the number of parameters.  There
are also some Monte-Carlo approaches to be wary of, such as bootstrap.
One of the places where NR falls short is that in the 2nd ed (1992)
they presented a very simplistic view of bootstrap that did not
discuss its limitations nor mention Markov chains or other
approaches.  They do now discuss those in the 3rd edition.

--jh--