[SciPy-User] How to fit a curve/function?

Thu Jun 9 11:52:46 EDT 2011

Hello again...

i try no to fit a curve using integrals as conditions.
the scipy manual says that integrations to infinite are possible with Inf,

I tried following but I fail (saying inf is not defined):

cond2 = 5.0/10/2 - integrate.quad(pdf,35000,Inf,args=(s1,s2))[0]

what causes the problem? Do I use quad/Inf in a wrong way?

The error is:
NameError: global name 'Inf' is not defined

/Johannes

-------- Original-Nachricht --------
> Datum: Wed, 8 Jun 2011 11:37:52 -0400
> Von: josef.pktd at gmail.com
> An: SciPy Users List <scipy-user at scipy.org>
> Betreff: Re: [SciPy-User] How to fit a curve/function?

> On Wed, Jun 8, 2011 at 11:37 AM,  <josef.pktd at gmail.com> wrote:
> > On Wed, Jun 8, 2011 at 10:56 AM, Johannes Radinger <JRadinger at gmx.at>
> wrote:
> >>
> >> -------- Original-Nachricht --------
> >>> Datum: Wed, 8 Jun 2011 10:33:45 -0400
> >>> Von: josef.pktd at gmail.com
> >>> An: SciPy Users List <scipy-user at scipy.org>
> >>> Betreff: Re: [SciPy-User] How to fit a curve/function?
> >>
> >>> On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger <JRadinger at gmx.at>
> >>> wrote:
> >>> >
> >>> > -------- Original-Nachricht --------
> >>> >> Datum: Wed, 8 Jun 2011 10:12:58 -0400
> >>> >> Von: josef.pktd at gmail.com
> >>> >> An: SciPy Users List <scipy-user at scipy.org>
> >>> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
> >>> >
> >>> >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger
> <JRadinger at gmx.at>
> >>> >> wrote:
> >>> >> >
> >>> >> > -------- Original-Nachricht --------
> >>> >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400
> >>> >> >> Von: josef.pktd at gmail.com
> >>> >> >> An: SciPy Users List <scipy-user at scipy.org>
> >>> >> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
> >>> >> >
> >>> >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger
> <JRadinger at gmx.at>
> >>> >> >> wrote:
> >>> >> >> > Hello,
> >>> >> >> >
> >>> >> >> > I've got following function describing any kind of animal
> >>> dispersal
> >>> >> >> kernel:
> >>> >> >> >
> >>> >> >> > def pdf(x,s1,s2):
> >>> >> >> >    return
> >>> >> >>
> >>> >>
> >>>
> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
> >>> >> >> >
> >>> >> >> > On the other hand I've got data from literature with which I
> want
> >>> to
> >>> >> fit
> >>> >> >> the function so that I get s1, s2 and x.
> >>> >> >> > Ususally the data in the literature are as follows:
> >>> >> >> >
> >>> >> >> > Example 1: 50% of the animals are between -270m and +270m and
> 90%
> >>> >>  are
> >>> >> >> between -500m and + 500m
> >>> >> >> >
> >>> >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are
> between
> >>> >> >> -1000m and +1000m
> >>> >> >> >
> >>> >> >> > So far as I understand an integration of the function is
> needed to
> >>> >> solve
> >>> >> >> for s1 and s2 as all the literature data give percentage (area
> under
> >>> >> the
> >>> >> >> curve) Can that be used to fit the curve or can that create
> ranges
> >>> for
> >>> >> s1
> >>> >> >> and s2.
> >>> >> >>
> >>> >> >> I don't see a way around integration.
> >>> >> >>
> >>> >> >> If you have exactly 2 probabilities, then you can you a solver
> like
> >>> >> >> scipy.optimize.fsolve to match the probabilites
> >>> >> >> eg.
> >>> >> >> 0.5 = integral pdf from -270 to 270
> >>> >> >> 0.9 = integral pdf from -500 to 500
> >>> >> >>
> >>> >> >> If you have more than 2 probabilities, then using optimization
> of a
> >>> >> >> weighted function of the moment conditions would be better.
> >>> >> >>
> >>> >> >> Josef
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > Hello again
> >>> >> >
> >>> >> > I tried following, but without success so far. What do I have to
> do
> >>> >> excactly...
> >>> >> >
> >>> >> > import numpy
> >>> >> > from scipy import stats
> >>> >> > from scipy import integrate
> >>> >> > from scipy.optimize import fsolve
> >>> >> > import math
> >>> >> >
> >>> >> > p=0.3
> >>> >> >
> >>> >> > def pdf(x,s1,s2):
> >>> >> >    return
> >>> >>
> >>>
> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
> >>> >> >
> >>> >> > def equ(s1,s2):
> >>> >> >    0.5==integrate.quad(pdf,-270,270,args=(s1,s2))
> >>> >> >    0.9==integrate.quad(pdf,-500,500,args=(s1,s2))
> >>> >> >
> >>> >> > result=fsolve(equ, 1,500)
> >>> >> >
> >>> >> > print result
> >>> >>
> >>> >> equ needs to return the deviation of the equations (I changed some
> >>> >> details for s1 just to try it)
> >>> >>
> >>> >> import numpy
> >>> >> from scipy import stats
> >>> >> from scipy import integrate
> >>> >> from scipy.optimize import fsolve
> >>> >> import math
> >>> >>
> >>> >> p=0.3
> >>> >>
> >>> >> def pdf(x,s1,s2):
> >>> >>     return
> >>> >>
> >>>
> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
> >>> >>
> >>> >> def equ(arg):
> >>> >>     s1,s2 = numpy.abs(arg)
> >>> >>     cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0]
> >>> >>     cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0]
> >>> >>     return [cond1, cond2]
> >>> >>
> >>> >> result=fsolve(equ, [200., 1200])
> >>
> >> thank you for your last reply...seems that the parameters of the two
> normals are nearly identical... anyway just two small addtional questions:
> >>
> >> 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start
> values so far as I understand...how should these be choosen? what is
> recommended?
> >
> > There is no general solution for choosing starting values, in your
> > case it should be possible to
> >
> >>>> q = np.array([0.5, 0.9])
> >>>> cr = x/stats.norm.ppf(0.5 + q/2.)
> >>>> x = [270, 500]
> >>>> q = np.array([0.5, 0.9])
> >>>> x = [270, 500]
> >>>> cr = x/stats.norm.ppf(0.5 + q/2.)
> >>>> stats.norm.cdf(500, scale=cr[1]) - stats.norm.cdf(-500, scale=cr[1])
> > 0.89999999999999991
> -------
> I forgot to remove the typos
> >>>> stats.norm.cdf(q[0], scale=cr[1]) - stats.norm.cdf(-q[0],
> scale=cr[0])
> > 0.0011545021185267457
> >>>> stats.norm.cdf(q[0], scale=cr[0]) - stats.norm.cdf(-q[0],
> scale=cr[0])
> > 0.000996601515122153
> ---------
> >>>> stats.norm.cdf(x[0], scale=cr[0]) - stats.norm.cdf(-x[0],
> scale=cr[0])
> > 0.5
> >>>> sol = fsolve(equ, np.sort(cr))
> >
> > there are some numerical problems finding the solution (???)
> >
> >>>> equ(sol)
> > array([-0.05361093,  0.05851309])
> >>>> from pprint import pprint
> >>>> pprint(fsolve(equ, np.sort(cr), xtol=1e-10, full_output=1))
> > (array([ 354.32616549,  354.69918062]),
> >  {'fjac': array([[-0.7373189 , -0.67554484],
> >       [ 0.67554484, -0.7373189 ]]),
> >  'fvec': array([-0.05361093,  0.05851309]),
> >  'nfev': 36,
> >  'qtf': array([  1.40019135e-07,  -7.93593929e-02]),
> >  'r': array([ -5.21390161e-04,  -1.21700831e-03,   3.88274320e-07])},
> >  5,
> >  'The iteration is not making good progress, as measured by the \n
> > improvement from the last ten iterations.')
> >
> >>
> >> 2) How can that be solve if I have I third condition (overfitted) can
> that be used as well or how does the alternative look like?
> >
> > use optimize.leastsq on equ (I never tried this for this case)
> > use fmin on the sum of squared errors
> >
> > if the intervals for the probabilities are non-overlapping (interval
> > data), then there is an optimal weighting matrix, (but my code for
> > that in the statsmodels.sandbox is not verified).
> >
> > Josef
> >
> >
> >>
> >> /johannes
> >>
> >>> >>
> >>> >> print result
> >>> >>
> >>> >> but in the results I get the parameters are very close to each
> other
> >>> >> [-356.5283675   353.82544075]
> >>> >>
> >>> >> the pdf looks just like a mixture of 2 normals both with loc=0,
> then
> >>> >> maybe the cdf of norm can be used directly
> >>> >
> >>> >
> >>> > Thank you for that hint... First yes these are 2 superimposed
> normals
> >>> but for other reasons I want to use the original formula instead of
> the
> >>> stats.functions...
> >>> >
> >>> > anyway there is still a thing...the locator s1 and s2 are like the
> scale
> >>> parameter of stats.norm so the are both + and -. For fsolve above it
> seems
> >>> that I get only one parameter (s1 or s2) but for the positive and
> negative
> >>> side of the distribution. So in actually there are four parameters
> -s1,
> >>> +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve
> to look
> >>> for the two values only in the positive range...
> >>>
> >>> It doesn't really matter, if the scale only shows up in quadratic
> >>> terms, or as in my initial change I added a absolute value, so whether
> >>> it's positive or negative, it's still only one value, and we
> >>> interprete it as postive scale
> >>>
> >>> s1 = sqrt(s1**2)
> >>>
> >>> Josef
> >>>
> >>> >
> >>> > any guesses?
> >>> >
> >>> > /J
> >>> >
> >>> >>
> >>> >> >>> from scipy import stats
> >>> >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270,
> scale=350)
> >>> >> 0.55954705470577526
> >>> >> >>>
> >>> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270,
> scale=354)
> >>> >> 0.55436474670960978
> >>> >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500,
> scale=354)
> >>> >> 0.84217642881921018
> >>> >>
> >>> >> Josef
> >>> >> >
> >>> >> >
> >>> >> > /Johannes
> >>> >> >>
> >>> >> >> >
> >>> >> >> > /Johannes
> >>> >> >> >
> >>> >> >> > --
> >>> >> >> > NEU: FreePhone - kostenlos mobil telefonieren!
> >>> >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone
> >>> >> >> > _______________________________________________
> >>> >> >> > SciPy-User mailing list
> >>> >> >> > SciPy-User at scipy.org
> >>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >> >> >
> >>> >> >> _______________________________________________
> >>> >> >> SciPy-User mailing list
> >>> >> >> SciPy-User at scipy.org
> >>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >> >
> >>> >> > --
> >>> >> > NEU: FreePhone - kostenlos mobil telefonieren!
> >>> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone
> >>> >> > _______________________________________________
> >>> >> > SciPy-User mailing list
> >>> >> > SciPy-User at scipy.org
> >>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >> >
> >>> >> _______________________________________________
> >>> >> SciPy-User mailing list
> >>> >> SciPy-User at scipy.org
> >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >
> >>> > --
> >>> > NEU: FreePhone - kostenlos mobil telefonieren!
> >>> > Jetzt informieren: http://www.gmx.net/de/go/freephone
> >>> > _______________________________________________
> >>> > SciPy-User mailing list
> >>> > SciPy-User at scipy.org
> >>> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >>> >
> >>> _______________________________________________
> >>> SciPy-User mailing list
> >>> SciPy-User at scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>
> >> --
> >> NEU: FreePhone - kostenlos mobil telefonieren!
> >> Jetzt informieren: http://www.gmx.net/de/go/freephone
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

-- 
NEU: FreePhone - kostenlos mobil telefonieren!			
Jetzt informieren: http://www.gmx.net/de/go/freephone