[SciPy-User] How to fit a curve/function?

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Jun 8 11:37:52 EDT 2011


On Wed, Jun 8, 2011 at 11:37 AM,  <josef.pktd at gmail.com> wrote:
> On Wed, Jun 8, 2011 at 10:56 AM, Johannes Radinger <JRadinger at gmx.at> wrote:
>>
>> -------- Original-Nachricht --------
>>> Datum: Wed, 8 Jun 2011 10:33:45 -0400
>>> Von: josef.pktd at gmail.com
>>> An: SciPy Users List <scipy-user at scipy.org>
>>> Betreff: Re: [SciPy-User] How to fit a curve/function?
>>
>>> On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger <JRadinger at gmx.at>
>>> wrote:
>>> >
>>> > -------- Original-Nachricht --------
>>> >> Datum: Wed, 8 Jun 2011 10:12:58 -0400
>>> >> Von: josef.pktd at gmail.com
>>> >> An: SciPy Users List <scipy-user at scipy.org>
>>> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
>>> >
>>> >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger <JRadinger at gmx.at>
>>> >> wrote:
>>> >> >
>>> >> > -------- Original-Nachricht --------
>>> >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400
>>> >> >> Von: josef.pktd at gmail.com
>>> >> >> An: SciPy Users List <scipy-user at scipy.org>
>>> >> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
>>> >> >
>>> >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger <JRadinger at gmx.at>
>>> >> >> wrote:
>>> >> >> > Hello,
>>> >> >> >
>>> >> >> > I've got following function describing any kind of animal
>>> dispersal
>>> >> >> kernel:
>>> >> >> >
>>> >> >> > def pdf(x,s1,s2):
>>> >> >> >    return
>>> >> >>
>>> >>
>>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>>> >> >> >
>>> >> >> > On the other hand I've got data from literature with which I want
>>> to
>>> >> fit
>>> >> >> the function so that I get s1, s2 and x.
>>> >> >> > Ususally the data in the literature are as follows:
>>> >> >> >
>>> >> >> > Example 1: 50% of the animals are between -270m and +270m and 90%
>>> >>  are
>>> >> >> between -500m and + 500m
>>> >> >> >
>>> >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are between
>>> >> >> -1000m and +1000m
>>> >> >> >
>>> >> >> > So far as I understand an integration of the function is needed to
>>> >> solve
>>> >> >> for s1 and s2 as all the literature data give percentage (area under
>>> >> the
>>> >> >> curve) Can that be used to fit the curve or can that create ranges
>>> for
>>> >> s1
>>> >> >> and s2.
>>> >> >>
>>> >> >> I don't see a way around integration.
>>> >> >>
>>> >> >> If you have exactly 2 probabilities, then you can you a solver like
>>> >> >> scipy.optimize.fsolve to match the probabilites
>>> >> >> eg.
>>> >> >> 0.5 = integral pdf from -270 to 270
>>> >> >> 0.9 = integral pdf from -500 to 500
>>> >> >>
>>> >> >> If you have more than 2 probabilities, then using optimization of a
>>> >> >> weighted function of the moment conditions would be better.
>>> >> >>
>>> >> >> Josef
>>> >> >
>>> >> >
>>> >> >
>>> >> > Hello again
>>> >> >
>>> >> > I tried following, but without success so far. What do I have to do
>>> >> excactly...
>>> >> >
>>> >> > import numpy
>>> >> > from scipy import stats
>>> >> > from scipy import integrate
>>> >> > from scipy.optimize import fsolve
>>> >> > import math
>>> >> >
>>> >> > p=0.3
>>> >> >
>>> >> > def pdf(x,s1,s2):
>>> >> >    return
>>> >>
>>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>>> >> >
>>> >> > def equ(s1,s2):
>>> >> >    0.5==integrate.quad(pdf,-270,270,args=(s1,s2))
>>> >> >    0.9==integrate.quad(pdf,-500,500,args=(s1,s2))
>>> >> >
>>> >> > result=fsolve(equ, 1,500)
>>> >> >
>>> >> > print result
>>> >>
>>> >> equ needs to return the deviation of the equations (I changed some
>>> >> details for s1 just to try it)
>>> >>
>>> >> import numpy
>>> >> from scipy import stats
>>> >> from scipy import integrate
>>> >> from scipy.optimize import fsolve
>>> >> import math
>>> >>
>>> >> p=0.3
>>> >>
>>> >> def pdf(x,s1,s2):
>>> >>     return
>>> >>
>>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>>> >>
>>> >> def equ(arg):
>>> >>     s1,s2 = numpy.abs(arg)
>>> >>     cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0]
>>> >>     cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0]
>>> >>     return [cond1, cond2]
>>> >>
>>> >> result=fsolve(equ, [200., 1200])
>>
>> thank you for your last reply...seems that the parameters of the two normals are nearly identical... anyway just two small addtional questions:
>>
>> 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start values so far as I understand...how should these be choosen? what is recommended?
>
> There is no general solution for choosing starting values, in your
> case it should be possible to
>
>>>> q = np.array([0.5, 0.9])
>>>> cr = x/stats.norm.ppf(0.5 + q/2.)
>>>> x = [270, 500]
>>>> q = np.array([0.5, 0.9])
>>>> x = [270, 500]
>>>> cr = x/stats.norm.ppf(0.5 + q/2.)
>>>> stats.norm.cdf(500, scale=cr[1]) - stats.norm.cdf(-500, scale=cr[1])
> 0.89999999999999991
-------
I forgot to remove the typos
>>>> stats.norm.cdf(q[0], scale=cr[1]) - stats.norm.cdf(-q[0], scale=cr[0])
> 0.0011545021185267457
>>>> stats.norm.cdf(q[0], scale=cr[0]) - stats.norm.cdf(-q[0], scale=cr[0])
> 0.000996601515122153
---------
>>>> stats.norm.cdf(x[0], scale=cr[0]) - stats.norm.cdf(-x[0], scale=cr[0])
> 0.5
>>>> sol = fsolve(equ, np.sort(cr))
>
> there are some numerical problems finding the solution (???)
>
>>>> equ(sol)
> array([-0.05361093,  0.05851309])
>>>> from pprint import pprint
>>>> pprint(fsolve(equ, np.sort(cr), xtol=1e-10, full_output=1))
> (array([ 354.32616549,  354.69918062]),
>  {'fjac': array([[-0.7373189 , -0.67554484],
>       [ 0.67554484, -0.7373189 ]]),
>  'fvec': array([-0.05361093,  0.05851309]),
>  'nfev': 36,
>  'qtf': array([  1.40019135e-07,  -7.93593929e-02]),
>  'r': array([ -5.21390161e-04,  -1.21700831e-03,   3.88274320e-07])},
>  5,
>  'The iteration is not making good progress, as measured by the \n
> improvement from the last ten iterations.')
>
>>
>> 2) How can that be solve if I have I third condition (overfitted) can that be used as well or how does the alternative look like?
>
> use optimize.leastsq on equ (I never tried this for this case)
> use fmin on the sum of squared errors
>
> if the intervals for the probabilities are non-overlapping (interval
> data), then there is an optimal weighting matrix, (but my code for
> that in the statsmodels.sandbox is not verified).
>
> Josef
>
>
>>
>> /johannes
>>
>>> >>
>>> >> print result
>>> >>
>>> >> but in the results I get the parameters are very close to each other
>>> >> [-356.5283675   353.82544075]
>>> >>
>>> >> the pdf looks just like a mixture of 2 normals both with loc=0, then
>>> >> maybe the cdf of norm can be used directly
>>> >
>>> >
>>> > Thank you for that hint... First yes these are 2 superimposed normals
>>> but for other reasons I want to use the original formula instead of the
>>> stats.functions...
>>> >
>>> > anyway there is still a thing...the locator s1 and s2 are like the scale
>>> parameter of stats.norm so the are both + and -. For fsolve above it seems
>>> that I get only one parameter (s1 or s2) but for the positive and negative
>>> side of the distribution. So in actually there are four parameters -s1,
>>> +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve to look
>>> for the two values only in the positive range...
>>>
>>> It doesn't really matter, if the scale only shows up in quadratic
>>> terms, or as in my initial change I added a absolute value, so whether
>>> it's positive or negative, it's still only one value, and we
>>> interprete it as postive scale
>>>
>>> s1 = sqrt(s1**2)
>>>
>>> Josef
>>>
>>> >
>>> > any guesses?
>>> >
>>> > /J
>>> >
>>> >>
>>> >> >>> from scipy import stats
>>> >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, scale=350)
>>> >> 0.55954705470577526
>>> >> >>>
>>> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, scale=354)
>>> >> 0.55436474670960978
>>> >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, scale=354)
>>> >> 0.84217642881921018
>>> >>
>>> >> Josef
>>> >> >
>>> >> >
>>> >> > /Johannes
>>> >> >>
>>> >> >> >
>>> >> >> > /Johannes
>>> >> >> >
>>> >> >> > --
>>> >> >> > NEU: FreePhone - kostenlos mobil telefonieren!
>>> >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>>> >> >> > _______________________________________________
>>> >> >> > SciPy-User mailing list
>>> >> >> > SciPy-User at scipy.org
>>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >> >
>>> >> >> _______________________________________________
>>> >> >> SciPy-User mailing list
>>> >> >> SciPy-User at scipy.org
>>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >
>>> >> > --
>>> >> > NEU: FreePhone - kostenlos mobil telefonieren!
>>> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>>> >> > _______________________________________________
>>> >> > SciPy-User mailing list
>>> >> > SciPy-User at scipy.org
>>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >> >
>>> >> _______________________________________________
>>> >> SciPy-User mailing list
>>> >> SciPy-User at scipy.org
>>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> > --
>>> > NEU: FreePhone - kostenlos mobil telefonieren!
>>> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>>> > _______________________________________________
>>> > SciPy-User mailing list
>>> > SciPy-User at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>> --
>> NEU: FreePhone - kostenlos mobil telefonieren!
>> Jetzt informieren: http://www.gmx.net/de/go/freephone
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>



More information about the SciPy-User mailing list