[SciPy-User] How to fit a curve/function?

Fri Jun 10 09:39:12 EDT 2011

On Fri, Jun 10, 2011 at 9:22 AM, Johannes Radinger <JRadinger at gmx.at> wrote:
>
> -------- Original-Nachricht --------
>> Datum: Thu, 9 Jun 2011 12:07:47 -0400
>> Von: josef.pktd at gmail.com
>> An: SciPy Users List <scipy-user at scipy.org>
>> Betreff: Re: [SciPy-User] How to fit a curve/function?
>
>> On Thu, Jun 9, 2011 at 11:52 AM, Johannes Radinger <JRadinger at gmx.at>
>> wrote:
>> > Hello again...
>> >
>> > i try no to fit a curve using integrals as conditions.
>> > the scipy manual says that integrations to infinite are possible with
>> Inf,
>> >
>> > I tried following but I fail (saying inf is not defined):
>> >
>> > cond2 = 5.0/10/2 - integrate.quad(pdf,35000,Inf,args=(s1,s2))[0]
>> >
>> > what causes the problem? Do I use quad/Inf in a wrong way?
>>
>> numpy.inf    inf doesn't exist in python itself
>
> Oh thank you for that! I was just confused because in the manual Inf is used itself in the example...anyway it works now..
>
> just additoinal questions:
>
> 1) How can I fit the curve with other kind of data. Like, if got something
> like a histogram (counts per category of x). I want to fit now the function already mentioned with these data...

if x is continuous and the counts are for bin intervals, then the same
idea as with the simple case works. It has been discussed a few times
on the mailing list, and I have a binned estimator for this in the
statsmodels.sandbox that you could use or use as recipe.
(scikits.statsmodels.distributions.estimators.fitbinned  uses maximum
likelihood estimation with random draws from bins with multinomial,
looking at it again after a long time I'm not quite sure anymore why
the multinomial is in there. There is also a fitbinnedgmm which is
matching moments, but I'm also not sure what the status is, since I
started to rewrite the gmm stuff a long time ago.)

>
> 2) Can I get a value how good the fit is?

I think for binned data, the chisquare test in scipy.stats should work
out of the box (with at least 5 expected counts per bin).
I haven't thought yet about the goodness-of-fit problem for the binned
mle version.

Josef

>
> /Johannes
>
>>
>> Josef
>>
>> >
>> > The error is:
>> > NameError: global name 'Inf' is not defined
>> >
>> > /Johannes
>> >
>> > -------- Original-Nachricht --------
>> >> Datum: Wed, 8 Jun 2011 11:37:52 -0400
>> >> Von: josef.pktd at gmail.com
>> >> An: SciPy Users List <scipy-user at scipy.org>
>> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
>> >
>> >> On Wed, Jun 8, 2011 at 11:37 AM,  <josef.pktd at gmail.com> wrote:
>> >> > On Wed, Jun 8, 2011 at 10:56 AM, Johannes Radinger <JRadinger at gmx.at>
>> >> wrote:
>> >> >>
>> >> >> -------- Original-Nachricht --------
>> >> >>> Datum: Wed, 8 Jun 2011 10:33:45 -0400
>> >> >>> Von: josef.pktd at gmail.com
>> >> >>> An: SciPy Users List <scipy-user at scipy.org>
>> >> >>> Betreff: Re: [SciPy-User] How to fit a curve/function?
>> >> >>
>> >> >>> On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger
>> <JRadinger at gmx.at>
>> >> >>> wrote:
>> >> >>> >
>> >> >>> > -------- Original-Nachricht --------
>> >> >>> >> Datum: Wed, 8 Jun 2011 10:12:58 -0400
>> >> >>> >> Von: josef.pktd at gmail.com
>> >> >>> >> An: SciPy Users List <scipy-user at scipy.org>
>> >> >>> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
>> >> >>> >
>> >> >>> >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger
>> >> <JRadinger at gmx.at>
>> >> >>> >> wrote:
>> >> >>> >> >
>> >> >>> >> > -------- Original-Nachricht --------
>> >> >>> >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400
>> >> >>> >> >> Von: josef.pktd at gmail.com
>> >> >>> >> >> An: SciPy Users List <scipy-user at scipy.org>
>> >> >>> >> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
>> >> >>> >> >
>> >> >>> >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger
>> >> <JRadinger at gmx.at>
>> >> >>> >> >> wrote:
>> >> >>> >> >> > Hello,
>> >> >>> >> >> >
>> >> >>> >> >> > I've got following function describing any kind of animal
>> >> >>> dispersal
>> >> >>> >> >> kernel:
>> >> >>> >> >> >
>> >> >>> >> >> > def pdf(x,s1,s2):
>> >> >>> >> >> >    return
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>> >> >>> >> >> >
>> >> >>> >> >> > On the other hand I've got data from literature with which
>> I
>> >> want
>> >> >>> to
>> >> >>> >> fit
>> >> >>> >> >> the function so that I get s1, s2 and x.
>> >> >>> >> >> > Ususally the data in the literature are as follows:
>> >> >>> >> >> >
>> >> >>> >> >> > Example 1: 50% of the animals are between -270m and +270m
>> and
>> >> 90%
>> >> >>> >>  are
>> >> >>> >> >> between -500m and + 500m
>> >> >>> >> >> >
>> >> >>> >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are
>> >> between
>> >> >>> >> >> -1000m and +1000m
>> >> >>> >> >> >
>> >> >>> >> >> > So far as I understand an integration of the function is
>> >> needed to
>> >> >>> >> solve
>> >> >>> >> >> for s1 and s2 as all the literature data give percentage
>> (area
>> >> under
>> >> >>> >> the
>> >> >>> >> >> curve) Can that be used to fit the curve or can that create
>> >> ranges
>> >> >>> for
>> >> >>> >> s1
>> >> >>> >> >> and s2.
>> >> >>> >> >>
>> >> >>> >> >> I don't see a way around integration.
>> >> >>> >> >>
>> >> >>> >> >> If you have exactly 2 probabilities, then you can you a
>> solver
>> >> like
>> >> >>> >> >> scipy.optimize.fsolve to match the probabilites
>> >> >>> >> >> eg.
>> >> >>> >> >> 0.5 = integral pdf from -270 to 270
>> >> >>> >> >> 0.9 = integral pdf from -500 to 500
>> >> >>> >> >>
>> >> >>> >> >> If you have more than 2 probabilities, then using
>> optimization
>> >> of a
>> >> >>> >> >> weighted function of the moment conditions would be better.
>> >> >>> >> >>
>> >> >>> >> >> Josef
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > Hello again
>> >> >>> >> >
>> >> >>> >> > I tried following, but without success so far. What do I have
>> to
>> >> do
>> >> >>> >> excactly...
>> >> >>> >> >
>> >> >>> >> > import numpy
>> >> >>> >> > from scipy import stats
>> >> >>> >> > from scipy import integrate
>> >> >>> >> > from scipy.optimize import fsolve
>> >> >>> >> > import math
>> >> >>> >> >
>> >> >>> >> > p=0.3
>> >> >>> >> >
>> >> >>> >> > def pdf(x,s1,s2):
>> >> >>> >> >    return
>> >> >>> >>
>> >> >>>
>> >>
>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>> >> >>> >> >
>> >> >>> >> > def equ(s1,s2):
>> >> >>> >> >    0.5==integrate.quad(pdf,-270,270,args=(s1,s2))
>> >> >>> >> >    0.9==integrate.quad(pdf,-500,500,args=(s1,s2))
>> >> >>> >> >
>> >> >>> >> > result=fsolve(equ, 1,500)
>> >> >>> >> >
>> >> >>> >> > print result
>> >> >>> >>
>> >> >>> >> equ needs to return the deviation of the equations (I changed
>> some
>> >> >>> >> details for s1 just to try it)
>> >> >>> >>
>> >> >>> >> import numpy
>> >> >>> >> from scipy import stats
>> >> >>> >> from scipy import integrate
>> >> >>> >> from scipy.optimize import fsolve
>> >> >>> >> import math
>> >> >>> >>
>> >> >>> >> p=0.3
>> >> >>> >>
>> >> >>> >> def pdf(x,s1,s2):
>> >> >>> >>     return
>> >> >>> >>
>> >> >>>
>> >>
>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>> >> >>> >>
>> >> >>> >> def equ(arg):
>> >> >>> >>     s1,s2 = numpy.abs(arg)
>> >> >>> >>     cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0]
>> >> >>> >>     cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0]
>> >> >>> >>     return [cond1, cond2]
>> >> >>> >>
>> >> >>> >> result=fsolve(equ, [200., 1200])
>> >> >>
>> >> >> thank you for your last reply...seems that the parameters of the two
>> >> normals are nearly identical... anyway just two small addtional
>> questions:
>> >> >>
>> >> >> 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start
>> >> values so far as I understand...how should these be choosen? what is
>> >> recommended?
>> >> >
>> >> > There is no general solution for choosing starting values, in your
>> >> > case it should be possible to
>> >> >
>> >> >>>> q = np.array([0.5, 0.9])
>> >> >>>> cr = x/stats.norm.ppf(0.5 + q/2.)
>> >> >>>> x = [270, 500]
>> >> >>>> q = np.array([0.5, 0.9])
>> >> >>>> x = [270, 500]
>> >> >>>> cr = x/stats.norm.ppf(0.5 + q/2.)
>> >> >>>> stats.norm.cdf(500, scale=cr[1]) - stats.norm.cdf(-500,
>> scale=cr[1])
>> >> > 0.89999999999999991
>> >> -------
>> >> I forgot to remove the typos
>> >> >>>> stats.norm.cdf(q[0], scale=cr[1]) - stats.norm.cdf(-q[0],
>> >> scale=cr[0])
>> >> > 0.0011545021185267457
>> >> >>>> stats.norm.cdf(q[0], scale=cr[0]) - stats.norm.cdf(-q[0],
>> >> scale=cr[0])
>> >> > 0.000996601515122153
>> >> ---------
>> >> >>>> stats.norm.cdf(x[0], scale=cr[0]) - stats.norm.cdf(-x[0],
>> >> scale=cr[0])
>> >> > 0.5
>> >> >>>> sol = fsolve(equ, np.sort(cr))
>> >> >
>> >> > there are some numerical problems finding the solution (???)
>> >> >
>> >> >>>> equ(sol)
>> >> > array([-0.05361093,  0.05851309])
>> >> >>>> from pprint import pprint
>> >> >>>> pprint(fsolve(equ, np.sort(cr), xtol=1e-10, full_output=1))
>> >> > (array([ 354.32616549,  354.69918062]),
>> >> >  {'fjac': array([[-0.7373189 , -0.67554484],
>> >> >       [ 0.67554484, -0.7373189 ]]),
>> >> >  'fvec': array([-0.05361093,  0.05851309]),
>> >> >  'nfev': 36,
>> >> >  'qtf': array([  1.40019135e-07,  -7.93593929e-02]),
>> >> >  'r': array([ -5.21390161e-04,  -1.21700831e-03,
>> 3.88274320e-07])},
>> >> >  5,
>> >> >  'The iteration is not making good progress, as measured by the \n
>> >> > improvement from the last ten iterations.')
>> >> >
>> >> >>
>> >> >> 2) How can that be solve if I have I third condition (overfitted)
>> can
>> >> that be used as well or how does the alternative look like?
>> >> >
>> >> > use optimize.leastsq on equ (I never tried this for this case)
>> >> > use fmin on the sum of squared errors
>> >> >
>> >> > if the intervals for the probabilities are non-overlapping (interval
>> >> > data), then there is an optimal weighting matrix, (but my code for
>> >> > that in the statsmodels.sandbox is not verified).
>> >> >
>> >> > Josef
>> >> >
>> >> >
>> >> >>
>> >> >> /johannes
>> >> >>
>> >> >>> >>
>> >> >>> >> print result
>> >> >>> >>
>> >> >>> >> but in the results I get the parameters are very close to each
>> >> other
>> >> >>> >> [-356.5283675   353.82544075]
>> >> >>> >>
>> >> >>> >> the pdf looks just like a mixture of 2 normals both with loc=0,
>> >> then
>> >> >>> >> maybe the cdf of norm can be used directly
>> >> >>> >
>> >> >>> >
>> >> >>> > Thank you for that hint... First yes these are 2 superimposed
>> >> normals
>> >> >>> but for other reasons I want to use the original formula instead of
>> >> the
>> >> >>> stats.functions...
>> >> >>> >
>> >> >>> > anyway there is still a thing...the locator s1 and s2 are like
>> the
>> >> scale
>> >> >>> parameter of stats.norm so the are both + and -. For fsolve above
>> it
>> >> seems
>> >> >>> that I get only one parameter (s1 or s2) but for the positive and
>> >> negative
>> >> >>> side of the distribution. So in actually there are four parameters
>> >> -s1,
>> >> >>> +s1, -s2, +s2. How can I solve that? Maybe I can restrict the
>> fsolve
>> >> to look
>> >> >>> for the two values only in the positive range...
>> >> >>>
>> >> >>> It doesn't really matter, if the scale only shows up in quadratic
>> >> >>> terms, or as in my initial change I added a absolute value, so
>> whether
>> >> >>> it's positive or negative, it's still only one value, and we
>> >> >>> interprete it as postive scale
>> >> >>>
>> >> >>> s1 = sqrt(s1**2)
>> >> >>>
>> >> >>> Josef
>> >> >>>
>> >> >>> >
>> >> >>> > any guesses?
>> >> >>> >
>> >> >>> > /J
>> >> >>> >
>> >> >>> >>
>> >> >>> >> >>> from scipy import stats
>> >> >>> >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270,
>> >> scale=350)
>> >> >>> >> 0.55954705470577526
>> >> >>> >> >>>
>> >> >>> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270,
>> >> scale=354)
>> >> >>> >> 0.55436474670960978
>> >> >>> >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500,
>> >> scale=354)
>> >> >>> >> 0.84217642881921018
>> >> >>> >>
>> >> >>> >> Josef
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > /Johannes
>> >> >>> >> >>
>> >> >>> >> >> >
>> >> >>> >> >> > /Johannes
>> >> >>> >> >> >
>> >> >>> >> >> > --
>> >> >>> >> >> > NEU: FreePhone - kostenlos mobil telefonieren!
>> >> >>> >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>> >> >>> >> >> > _______________________________________________
>> >> >>> >> >> > SciPy-User mailing list
>> >> >>> >> >> > SciPy-User at scipy.org
>> >> >>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >>> >> >> >
>> >> >>> >> >> _______________________________________________
>> >> >>> >> >> SciPy-User mailing list
>> >> >>> >> >> SciPy-User at scipy.org
>> >> >>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >>> >> >
>> >> >>> >> > --
>> >> >>> >> > NEU: FreePhone - kostenlos mobil telefonieren!
>> >> >>> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>> >> >>> >> > _______________________________________________
>> >> >>> >> > SciPy-User mailing list
>> >> >>> >> > SciPy-User at scipy.org
>> >> >>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >>> >> >
>> >> >>> >> _______________________________________________
>> >> >>> >> SciPy-User mailing list
>> >> >>> >> SciPy-User at scipy.org
>> >> >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >>> >
>> >> >>> > --
>> >> >>> > NEU: FreePhone - kostenlos mobil telefonieren!
>> >> >>> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>> >> >>> > _______________________________________________
>> >> >>> > SciPy-User mailing list
>> >> >>> > SciPy-User at scipy.org
>> >> >>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >>> >
>> >> >>> _______________________________________________
>> >> >>> SciPy-User mailing list
>> >> >>> SciPy-User at scipy.org
>> >> >>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >>
>> >> >> --
>> >> >> NEU: FreePhone - kostenlos mobil telefonieren!
>> >> >> Jetzt informieren: http://www.gmx.net/de/go/freephone
>> >> >> _______________________________________________
>> >> >> SciPy-User mailing list
>> >> >> SciPy-User at scipy.org
>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >>
>> >> >
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> > --
>> > NEU: FreePhone - kostenlos mobil telefonieren!
>> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> --
> NEU: FreePhone - kostenlos mobil telefonieren!
> Jetzt informieren: http://www.gmx.net/de/go/freephone
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>