[SciPy-User] Least Square fit and goodness of fit
josef.pktd at gmail.com
josef.pktd at gmail.com
Mon May 17 11:20:13 EDT 2010
On Mon, May 17, 2010 at 7:35 AM, <josef.pktd at gmail.com> wrote:
> On Mon, May 17, 2010 at 2:01 AM, Benedikt Riedel <briedel at wisc.edu> wrote:
>> Thanks for the clarification. I am still not sure how to get the chi-squared
>> value of my regression though. When I use the formula under "Regression
>> Analysis" here
>>
>> http://en.wikipedia.org/wiki/Goodness_of_fit
>>
>> I get a chi-square somewhere around 19, which seems way to large compared to
>> the value of 3.2 I get for the same data set when I fit it using gnuplot.
>> Where gnuplot supposedly used the weighted sum of squares of residuals. I do
>> not fully this because of the results I get.
>>
>> Here is the python code I used:
>>
>> chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau),
>> 2)/pow(R4errctsdataselect,2)))/dof
>
>
> from some gnuplot help page it looks like what they call chisquare is WSSR/dof
>
> which would be something like
>
> chi2=(sum( ( R4ctsdataselect-fitfunc(pinit, tau)) /
> sqrt(R4errctsdataselect) )**2 )/dof
>
> I'm not sure whether the sqrt is in there or not, because I don't
> remember the normalization that is used, weights or weights squared
(for reference)
gnuplot is pretty vague on the denominator
http://theochem.ki.ku.dk/on_line_docs/gnuplot/gnuplot_21.html#SEC81
a bit better explanation of the terminology
http://www.graphpad.com/faq/viewfaq.cfm?faq=926
any more explicit reference has sigma in the denominator
Josef
> Josef
>
>
>
>
>>
>> Sorry for being so thick headed, statistics is just beyond me at times.
>>
>> Cheers,
>>
>> Ben
>>
>> On Mon, May 17, 2010 at 00:20, <josef.pktd at gmail.com> wrote:
>>>
>>> On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel <briedel at wisc.edu>
>>> wrote:
>>> >
>>> >
>>> > On Sun, May 16, 2010 at 22:33, <josef.pktd at gmail.com> wrote:
>>> >>
>>> >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel <briedel at wisc.edu>
>>> >> wrote:
>>> >> > What I still do not understand is the fact that curve_fit gives me a
>>> >> > different output then leastsq, even though curve_fit calls leastsq.
>>> >> >
>>> >> > I tried to get the chi-squared because we want to plot contours of
>>> >> > chi-square from the minimum to the maximum. I used following code:
>>> >> >
>>> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>>> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>>> >> > pinit = [20,20.]
>>> >> >
>>> >> > def func(x, a, b):
>>> >> > return a*exp(-x) + b
>>> >> >
>>> >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit,
>>> >> > sigma=R4errctsdataselect)
>>> >>
>>> >> this uses weighted least squares
>>> >> sigma : None or N-length sequence
>>> >> If not None, it represents the standard-deviation of ydata. This
>>> >> vector, if given, will be used as weights in the least-squares problem
>>> >>
>>> >> In your initial example with leastsq you don't have any weighting,
>>> >> it's just ordinary least squares
>>> >>
>>> >> maybe that's the difference.
>>> >>
>>> >>
>>> >
>>> > Yeah I guess that will be it.
>>> >
>>> >>
>>> >> > print pfinal
>>> >> > print covar
>>> >> > dof=size(tau)-size(pinit)
>>> >> > print dof
>>> >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit,
>>> >> > tau)))/dof
>>> >> > print chi2
>>> >> >
>>> >> > I am not 100% sure I am doing the degrees of freedom calculation
>>> >> > right.
>>> >> > I
>>> >> > got the chi-square formula from the Pearson chi-squared test.
>>> >>
>>> >> I don't recognize your formula for chi2, and I don't see the
>>> >> connection to Pearson chi-squared test .
>>> >>
>>> >> Do you have a reference?
>>> >>
>>> >
>>> > I based my use of the Pearson test from what I read in an Econometrics
>>> > book,
>>> > but wiki has the a pretty good description. I basically based it off the
>>> > example there. Where the expected would be what comes out of the fit and
>>> > what you is the "R4ctsdataselect" for those specific values.
>>> >
>>> > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test
>>>
>>> I looked at that, but it's a completely different case, the values in
>>> the formulas are frequencies
>>>
>>> Oi = an observed frequency;
>>> Ei = an expected (theoretical) frequency, asserted by the null
>>> hypothesis;
>>>
>>> not points on a regression curve
>>>
>>> Josef
>>>
>>> >
>>> >
>>> >>
>>> >> Josef
>>> >>
>>> >
>>> > Thanks again
>>> >
>>> > Ben
>>> >
>>> >
>>> >>
>>> >> >
>>> >> > Thank you very much for the help so far.
>>> >> >
>>> >> > Cheers,
>>> >> >
>>> >> > Ben
>>> >> >
>>> >> > On Sun, May 16, 2010 at 05:50, <josef.pktd at gmail.com> wrote:
>>> >> >>
>>> >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <briedel at wisc.edu>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >
>>> >> >> > On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
>>> >> >> >>
>>> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel
>>> >> >> >> <briedel at wisc.edu>
>>> >> >> >> wrote:
>>> >> >> >> > Hey,
>>> >> >> >> >
>>> >> >> >> > I am fairly new Scipy and am trying to do a least square fit to
>>> >> >> >> > a
>>> >> >> >> > set
>>> >> >> >> > of
>>> >> >> >> > data. Currently, I am using following code:
>>> >> >> >> >
>>> >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>>> >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>>> >> >> >> > pinit = [20,20.]
>>> >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect),
>>> >> >> >> > full_output=1)
>>> >> >> >> >
>>> >> >> >> > I am now trying to get the goodness of fit out of this data. I
>>> >> >> >> > am
>>> >> >> >> > sort
>>> >> >> >> > of
>>> >> >> >> > running into a brick wall because I found a lot of conflicting
>>> >> >> >> > ways
>>> >> >> >> > of
>>> >> >> >> > how
>>> >> >> >> > to calculate it.
>>> >> >> >>
>>> >> >> >> For regression the usual is
>>> >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination
>>> >> >> >> coefficient of determination is
>>> >> >> >>
>>> >> >> >> R^2 = 1 - {SS_{err} / SS_{tot}}
>>> >> >> >>
>>> >> >> >> Note your fitfunc is linear in parameters and can be better
>>> >> >> >> estimated
>>> >> >> >> by linear least squares, OLS.
>>> >> >> >> linear regression is handled in statsmodels and you can get lot's
>>> >> >> >> of
>>> >> >> >> statistics without worrying about the formulas.
>>> >> >> >> If you only have one slope parameter, then scipy.stats.linregress
>>> >> >> >> also
>>> >> >> >> works
>>> >> >> >>
>>> >> >> >
>>> >> >> > Thanks for the information. I am still note quite sure if this is
>>> >> >> > what
>>> >> >> > my
>>> >> >> > boss wants because there should not be an average y value.
>>> >> >>
>>> >> >> The definition of Rsquared is pretty uncontroversial with the
>>> >> >> y.mean()
>>> >> >> correction, if there is a constant in the regression (although I
>>> >> >> know
>>> >> >> mainly the linear case for this).
>>> >> >>
>>> >> >> If there is no constant in the regression, the definition or
>>> >> >> Rsquared
>>> >> >> is not clear/unambiguous, but usually used without mean correction
>>> >> >> of
>>> >> >> y.
>>> >> >>
>>> >> >> Josef
>>> >> >>
>>> >> >> >
>>> >> >> >>
>>> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the
>>> >> >> >> covariance
>>> >> >> >> of the parameter estimates.
>>> >> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
>>> >> >> >
>>> >> >> > I have been trying this out, but the fit just looks horrid
>>> >> >> > compared
>>> >> >> > to
>>> >> >> > using
>>> >> >> > leastsq method even though they call the same function according
>>> >> >> > to
>>> >> >> > the
>>> >> >> > documentation.
>>> >> >> >
>>> >> >> >>
>>> >> >> >> > I am aware of the chisquare function in stats function, but the
>>> >> >> >> > documentation seems a little confusing to me. Any help would be
>>> >> >> >> > greatly
>>> >> >> >> > appreciates.
>>> >> >> >>
>>> >> >> >> chisquare and others like kolmogorov-smirnov are more for testing
>>> >> >> >> the
>>> >> >> >> goodness-of-fit of entire distributions, not for how well a curve
>>> >> >> >> or
>>> >> >> >> line fits the data.
>>> >> >> >>
>>> >> >> >
>>> >> >> > That is what I thought, which brought up my confusion when I asked
>>> >> >> > other
>>> >> >> > people and they told me to use that
>>> >> >> >
>>> >> >> >>
>>> >> >> >> Josef
>>> >> >> >>
>>> >> >> >> >
>>> >> >> >> > Thanks very much in advance.
>>> >> >> >> >
>>> >> >> >> > Cheers,
>>> >> >> >> >
>>> >> >> >> > Ben
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > _______________________________________________
>>> >> >> >> > SciPy-User mailing list
>>> >> >> >> > SciPy-User at scipy.org
>>> >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> _______________________________________________
>>> >> >> >> SciPy-User mailing list
>>> >> >> >> SciPy-User at scipy.org
>>> >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Benedikt Riedel
>>> >> >> > Graduate Student University of Wisconsin-Madison
>>> >> >> > Department of Physics
>>> >> >> > Office: 2304 Chamberlin Hall
>>> >> >> > Lab: 6247 Chamberlin Hall
>>> >> >> > Tel: (608) 301-5736
>>> >> >> > Cell: (213) 519-1771
>>> >> >> > Lab: (608) 262-5916
>>> >> >> >
>>> >> >> > _______________________________________________
>>> >> >> > SciPy-User mailing list
>>> >> >> > SciPy-User at scipy.org
>>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >> >
>>> >> >> >
>>> >> >> _______________________________________________
>>> >> >> SciPy-User mailing list
>>> >> >> SciPy-User at scipy.org
>>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Benedikt Riedel
>>> >> > Graduate Student University of Wisconsin-Madison
>>> >> > Department of Physics
>>> >> > Office: 2304 Chamberlin Hall
>>> >> > Lab: 6247 Chamberlin Hall
>>> >> > Tel: (608) 301-5736
>>> >> > Cell: (213) 519-1771
>>> >> > Lab: (608) 262-5916
>>> >> >
>>> >> > _______________________________________________
>>> >> > SciPy-User mailing list
>>> >> > SciPy-User at scipy.org
>>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >
>>> >> >
>>> >> _______________________________________________
>>> >> SciPy-User mailing list
>>> >> SciPy-User at scipy.org
>>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> >
>>> >
>>> > --
>>> > Benedikt Riedel
>>> > Graduate Student University of Wisconsin-Madison
>>> > Department of Physics
>>> > Office: 2304 Chamberlin Hall
>>> > Lab: 6247 Chamberlin Hall
>>> > Tel: (608) 301-5736
>>> > Cell: (213) 519-1771
>>> > Lab: (608) 262-5916
>>> >
>>> > _______________________________________________
>>> > SciPy-User mailing list
>>> > SciPy-User at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> >
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>>
>> --
>> Benedikt Riedel
>> Graduate Student University of Wisconsin-Madison
>> Department of Physics
>> Office: 2304 Chamberlin Hall
>> Lab: 6247 Chamberlin Hall
>> Tel: (608) 301-5736
>> Cell: (213) 519-1771
>> Lab: (608) 262-5916
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
More information about the SciPy-User
mailing list