[SciPy-User] How to fit a curve/function?

Fri Jun 10 09:40:36 EDT 2011

On Fri, Jun 10, 2011 at 9:39 AM,  <josef.pktd at gmail.com> wrote:
> On Fri, Jun 10, 2011 at 9:22 AM, Johannes Radinger <JRadinger at gmx.at> wrote:
>>
>> -------- Original-Nachricht --------
>>> Datum: Thu, 9 Jun 2011 12:07:47 -0400
>>> Von: josef.pktd at gmail.com
>>> An: SciPy Users List <scipy-user at scipy.org>
>>> Betreff: Re: [SciPy-User] How to fit a curve/function?
>>
>>> On Thu, Jun 9, 2011 at 11:52 AM, Johannes Radinger <JRadinger at gmx.at>
>>> wrote:
>>> > Hello again...
>>> >
>>> > i try no to fit a curve using integrals as conditions.
>>> > the scipy manual says that integrations to infinite are possible with
>>> Inf,
>>> >
>>> > I tried following but I fail (saying inf is not defined):
>>> >
>>> > cond2 = 5.0/10/2 - integrate.quad(pdf,35000,Inf,args=(s1,s2))[0]
>>> >
>>> > what causes the problem? Do I use quad/Inf in a wrong way?
>>>
>>> numpy.inf    inf doesn't exist in python itself
>>
>> Oh thank you for that! I was just confused because in the manual Inf is used itself in the example...anyway it works now..
>>
>> just additoinal questions:
>>
>> 1) How can I fit the curve with other kind of data. Like, if got something
>> like a histogram (counts per category of x). I want to fit now the function already mentioned with these data...
>
> if x is continuous and the counts are for bin intervals, then the same
> idea as with the simple case works. It has been discussed a few times
> on the mailing list, and I have a binned estimator for this in the
> statsmodels.sandbox that you could use or use as recipe.
> (scikits.statsmodels.distributions.estimators.fitbinned  uses maximum
> likelihood estimation with random draws from bins with multinomial,
> looking at it again after a long time I'm not quite sure anymore why
> the multinomial is in there. There is also a fitbinnedgmm which is
> matching moments, but I'm also not sure what the status is, since I
> started to rewrite the gmm stuff a long time ago.)

I forgot to add: gmm stands for generalized method of moments, not
gaussian mixture models.

Josef

>
>>
>> 2) Can I get a value how good the fit is?
>
> I think for binned data, the chisquare test in scipy.stats should work
> out of the box (with at least 5 expected counts per bin).
> I haven't thought yet about the goodness-of-fit problem for the binned
> mle version.
>
> Josef
>
>
>>
>> /Johannes
>>
>>>
>>> Josef
>>>
>>> >
>>> > The error is:
>>> > NameError: global name 'Inf' is not defined
>>> >
>>> > /Johannes
>>> >
>>> > -------- Original-Nachricht --------
>>> >> Datum: Wed, 8 Jun 2011 11:37:52 -0400
>>> >> Von: josef.pktd at gmail.com
>>> >> An: SciPy Users List <scipy-user at scipy.org>
>>> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
>>> >
>>> >> On Wed, Jun 8, 2011 at 11:37 AM,  <josef.pktd at gmail.com> wrote:
>>> >> > On Wed, Jun 8, 2011 at 10:56 AM, Johannes Radinger <JRadinger at gmx.at>
>>> >> wrote:
>>> >> >>
>>> >> >> -------- Original-Nachricht --------
>>> >> >>> Datum: Wed, 8 Jun 2011 10:33:45 -0400
>>> >> >>> Von: josef.pktd at gmail.com
>>> >> >>> An: SciPy Users List <scipy-user at scipy.org>
>>> >> >>> Betreff: Re: [SciPy-User] How to fit a curve/function?
>>> >> >>
>>> >> >>> On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger
>>> <JRadinger at gmx.at>
>>> >> >>> wrote:
>>> >> >>> >
>>> >> >>> > -------- Original-Nachricht --------
>>> >> >>> >> Datum: Wed, 8 Jun 2011 10:12:58 -0400
>>> >> >>> >> Von: josef.pktd at gmail.com
>>> >> >>> >> An: SciPy Users List <scipy-user at scipy.org>
>>> >> >>> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
>>> >> >>> >
>>> >> >>> >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger
>>> >> <JRadinger at gmx.at>
>>> >> >>> >> wrote:
>>> >> >>> >> >
>>> >> >>> >> > -------- Original-Nachricht --------
>>> >> >>> >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400
>>> >> >>> >> >> Von: josef.pktd at gmail.com
>>> >> >>> >> >> An: SciPy Users List <scipy-user at scipy.org>
>>> >> >>> >> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
>>> >> >>> >> >
>>> >> >>> >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger
>>> >> <JRadinger at gmx.at>
>>> >> >>> >> >> wrote:
>>> >> >>> >> >> > Hello,
>>> >> >>> >> >> >
>>> >> >>> >> >> > I've got following function describing any kind of animal
>>> >> >>> dispersal
>>> >> >>> >> >> kernel:
>>> >> >>> >> >> >
>>> >> >>> >> >> > def pdf(x,s1,s2):
>>> >> >>> >> >> >    return
>>> >> >>> >> >>
>>> >> >>> >>
>>> >> >>>
>>> >>
>>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>>> >> >>> >> >> >
>>> >> >>> >> >> > On the other hand I've got data from literature with which
>>> I
>>> >> want
>>> >> >>> to
>>> >> >>> >> fit
>>> >> >>> >> >> the function so that I get s1, s2 and x.
>>> >> >>> >> >> > Ususally the data in the literature are as follows:
>>> >> >>> >> >> >
>>> >> >>> >> >> > Example 1: 50% of the animals are between -270m and +270m
>>> and
>>> >> 90%
>>> >> >>> >>  are
>>> >> >>> >> >> between -500m and + 500m
>>> >> >>> >> >> >
>>> >> >>> >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are
>>> >> between
>>> >> >>> >> >> -1000m and +1000m
>>> >> >>> >> >> >
>>> >> >>> >> >> > So far as I understand an integration of the function is
>>> >> needed to
>>> >> >>> >> solve
>>> >> >>> >> >> for s1 and s2 as all the literature data give percentage
>>> (area
>>> >> under
>>> >> >>> >> the
>>> >> >>> >> >> curve) Can that be used to fit the curve or can that create
>>> >> ranges
>>> >> >>> for
>>> >> >>> >> s1
>>> >> >>> >> >> and s2.
>>> >> >>> >> >>
>>> >> >>> >> >> I don't see a way around integration.
>>> >> >>> >> >>
>>> >> >>> >> >> If you have exactly 2 probabilities, then you can you a
>>> solver
>>> >> like
>>> >> >>> >> >> scipy.optimize.fsolve to match the probabilites
>>> >> >>> >> >> eg.
>>> >> >>> >> >> 0.5 = integral pdf from -270 to 270
>>> >> >>> >> >> 0.9 = integral pdf from -500 to 500
>>> >> >>> >> >>
>>> >> >>> >> >> If you have more than 2 probabilities, then using
>>> optimization
>>> >> of a
>>> >> >>> >> >> weighted function of the moment conditions would be better.
>>> >> >>> >> >>
>>> >> >>> >> >> Josef
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> > Hello again
>>> >> >>> >> >
>>> >> >>> >> > I tried following, but without success so far. What do I have
>>> to
>>> >> do
>>> >> >>> >> excactly...
>>> >> >>> >> >
>>> >> >>> >> > import numpy
>>> >> >>> >> > from scipy import stats
>>> >> >>> >> > from scipy import integrate
>>> >> >>> >> > from scipy.optimize import fsolve
>>> >> >>> >> > import math
>>> >> >>> >> >
>>> >> >>> >> > p=0.3
>>> >> >>> >> >
>>> >> >>> >> > def pdf(x,s1,s2):
>>> >> >>> >> >    return
>>> >> >>> >>
>>> >> >>>
>>> >>
>>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>>> >> >>> >> >
>>> >> >>> >> > def equ(s1,s2):
>>> >> >>> >> >    0.5==integrate.quad(pdf,-270,270,args=(s1,s2))
>>> >> >>> >> >    0.9==integrate.quad(pdf,-500,500,args=(s1,s2))
>>> >> >>> >> >
>>> >> >>> >> > result=fsolve(equ, 1,500)
>>> >> >>> >> >
>>> >> >>> >> > print result
>>> >> >>> >>
>>> >> >>> >> equ needs to return the deviation of the equations (I changed
>>> some
>>> >> >>> >> details for s1 just to try it)
>>> >> >>> >>
>>> >> >>> >> import numpy
>>> >> >>> >> from scipy import stats
>>> >> >>> >> from scipy import integrate
>>> >> >>> >> from scipy.optimize import fsolve
>>> >> >>> >> import math
>>> >> >>> >>
>>> >> >>> >> p=0.3
>>> >> >>> >>
>>> >> >>> >> def pdf(x,s1,s2):
>>> >> >>> >>     return
>>> >> >>> >>
>>> >> >>>
>>> >>
>>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>>> >> >>> >>
>>> >> >>> >> def equ(arg):
>>> >> >>> >>     s1,s2 = numpy.abs(arg)
>>> >> >>> >>     cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0]
>>> >> >>> >>     cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0]
>>> >> >>> >>     return [cond1, cond2]
>>> >> >>> >>
>>> >> >>> >> result=fsolve(equ, [200., 1200])
>>> >> >>
>>> >> >> thank you for your last reply...seems that the parameters of the two
>>> >> normals are nearly identical... anyway just two small addtional
>>> questions:
>>> >> >>
>>> >> >> 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start
>>> >> values so far as I understand...how should these be choosen? what is
>>> >> recommended?
>>> >> >
>>> >> > There is no general solution for choosing starting values, in your
>>> >> > case it should be possible to
>>> >> >
>>> >> >>>> q = np.array([0.5, 0.9])
>>> >> >>>> cr = x/stats.norm.ppf(0.5 + q/2.)
>>> >> >>>> x = [270, 500]
>>> >> >>>> q = np.array([0.5, 0.9])
>>> >> >>>> x = [270, 500]
>>> >> >>>> cr = x/stats.norm.ppf(0.5 + q/2.)
>>> >> >>>> stats.norm.cdf(500, scale=cr[1]) - stats.norm.cdf(-500,
>>> scale=cr[1])
>>> >> > 0.89999999999999991
>>> >> -------
>>> >> I forgot to remove the typos
>>> >> >>>> stats.norm.cdf(q[0], scale=cr[1]) - stats.norm.cdf(-q[0],
>>> >> scale=cr[0])
>>> >> > 0.0011545021185267457
>>> >> >>>> stats.norm.cdf(q[0], scale=cr[0]) - stats.norm.cdf(-q[0],
>>> >> scale=cr[0])
>>> >> > 0.000996601515122153
>>> >> ---------
>>> >> >>>> stats.norm.cdf(x[0], scale=cr[0]) - stats.norm.cdf(-x[0],
>>> >> scale=cr[0])
>>> >> > 0.5
>>> >> >>>> sol = fsolve(equ, np.sort(cr))
>>> >> >
>>> >> > there are some numerical problems finding the solution (???)
>>> >> >
>>> >> >>>> equ(sol)
>>> >> > array([-0.05361093,  0.05851309])
>>> >> >>>> from pprint import pprint
>>> >> >>>> pprint(fsolve(equ, np.sort(cr), xtol=1e-10, full_output=1))
>>> >> > (array([ 354.32616549,  354.69918062]),
>>> >> >  {'fjac': array([[-0.7373189 , -0.67554484],
>>> >> >       [ 0.67554484, -0.7373189 ]]),
>>> >> >  'fvec': array([-0.05361093,  0.05851309]),
>>> >> >  'nfev': 36,
>>> >> >  'qtf': array([  1.40019135e-07,  -7.93593929e-02]),
>>> >> >  'r': array([ -5.21390161e-04,  -1.21700831e-03,
>>> 3.88274320e-07])},
>>> >> >  5,
>>> >> >  'The iteration is not making good progress, as measured by the \n
>>> >> > improvement from the last ten iterations.')
>>> >> >
>>> >> >>
>>> >> >> 2) How can that be solve if I have I third condition (overfitted)
>>> can
>>> >> that be used as well or how does the alternative look like?
>>> >> >
>>> >> > use optimize.leastsq on equ (I never tried this for this case)
>>> >> > use fmin on the sum of squared errors
>>> >> >
>>> >> > if the intervals for the probabilities are non-overlapping (interval
>>> >> > data), then there is an optimal weighting matrix, (but my code for
>>> >> > that in the statsmodels.sandbox is not verified).
>>> >> >
>>> >> > Josef
>>> >> >
>>> >> >
>>> >> >>
>>> >> >> /johannes
>>> >> >>
>>> >> >>> >>
>>> >> >>> >> print result
>>> >> >>> >>
>>> >> >>> >> but in the results I get the parameters are very close to each
>>> >> other
>>> >> >>> >> [-356.5283675   353.82544075]
>>> >> >>> >>
>>> >> >>> >> the pdf looks just like a mixture of 2 normals both with loc=0,
>>> >> then
>>> >> >>> >> maybe the cdf of norm can be used directly
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > Thank you for that hint... First yes these are 2 superimposed
>>> >> normals
>>> >> >>> but for other reasons I want to use the original formula instead of
>>> >> the
>>> >> >>> stats.functions...
>>> >> >>> >
>>> >> >>> > anyway there is still a thing...the locator s1 and s2 are like
>>> the
>>> >> scale
>>> >> >>> parameter of stats.norm so the are both + and -. For fsolve above
>>> it
>>> >> seems
>>> >> >>> that I get only one parameter (s1 or s2) but for the positive and
>>> >> negative
>>> >> >>> side of the distribution. So in actually there are four parameters
>>> >> -s1,
>>> >> >>> +s1, -s2, +s2. How can I solve that? Maybe I can restrict the
>>> fsolve
>>> >> to look
>>> >> >>> for the two values only in the positive range...
>>> >> >>>
>>> >> >>> It doesn't really matter, if the scale only shows up in quadratic
>>> >> >>> terms, or as in my initial change I added a absolute value, so
>>> whether
>>> >> >>> it's positive or negative, it's still only one value, and we
>>> >> >>> interprete it as postive scale
>>> >> >>>
>>> >> >>> s1 = sqrt(s1**2)
>>> >> >>>
>>> >> >>> Josef
>>> >> >>>
>>> >> >>> >
>>> >> >>> > any guesses?
>>> >> >>> >
>>> >> >>> > /J
>>> >> >>> >
>>> >> >>> >>
>>> >> >>> >> >>> from scipy import stats
>>> >> >>> >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270,
>>> >> scale=350)
>>> >> >>> >> 0.55954705470577526
>>> >> >>> >> >>>
>>> >> >>> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270,
>>> >> scale=354)
>>> >> >>> >> 0.55436474670960978
>>> >> >>> >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500,
>>> >> scale=354)
>>> >> >>> >> 0.84217642881921018
>>> >> >>> >>
>>> >> >>> >> Josef
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> > /Johannes
>>> >> >>> >> >>
>>> >> >>> >> >> >
>>> >> >>> >> >> > /Johannes
>>> >> >>> >> >> >
>>> >> >>> >> >> > --
>>> >> >>> >> >> > NEU: FreePhone - kostenlos mobil telefonieren!
>>> >> >>> >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>>> >> >>> >> >> > _______________________________________________
>>> >> >>> >> >> > SciPy-User mailing list
>>> >> >>> >> >> > SciPy-User at scipy.org
>>> >> >>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >>> >> >> >
>>> >> >>> >> >> _______________________________________________
>>> >> >>> >> >> SciPy-User mailing list
>>> >> >>> >> >> SciPy-User at scipy.org
>>> >> >>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >>> >> >
>>> >> >>> >> > --
>>> >> >>> >> > NEU: FreePhone - kostenlos mobil telefonieren!
>>> >> >>> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>>> >> >>> >> > _______________________________________________
>>> >> >>> >> > SciPy-User mailing list
>>> >> >>> >> > SciPy-User at scipy.org
>>> >> >>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >>> >> >
>>> >> >>> >> _______________________________________________
>>> >> >>> >> SciPy-User mailing list
>>> >> >>> >> SciPy-User at scipy.org
>>> >> >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >>> >
>>> >> >>> > --
>>> >> >>> > NEU: FreePhone - kostenlos mobil telefonieren!
>>> >> >>> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>>> >> >>> > _______________________________________________
>>> >> >>> > SciPy-User mailing list
>>> >> >>> > SciPy-User at scipy.org
>>> >> >>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >>> >
>>> >> >>> _______________________________________________
>>> >> >>> SciPy-User mailing list
>>> >> >>> SciPy-User at scipy.org
>>> >> >>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >>
>>> >> >> --
>>> >> >> NEU: FreePhone - kostenlos mobil telefonieren!
>>> >> >> Jetzt informieren: http://www.gmx.net/de/go/freephone
>>> >> >> _______________________________________________
>>> >> >> SciPy-User mailing list
>>> >> >> SciPy-User at scipy.org
>>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >>
>>> >> >
>>> >> _______________________________________________
>>> >> SciPy-User mailing list
>>> >> SciPy-User at scipy.org
>>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> > --
>>> > NEU: FreePhone - kostenlos mobil telefonieren!
>>> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>>> > _______________________________________________
>>> > SciPy-User mailing list
>>> > SciPy-User at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>> --
>> NEU: FreePhone - kostenlos mobil telefonieren!
>> Jetzt informieren: http://www.gmx.net/de/go/freephone
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>