Fitting a distribution to some data
Hi, I am trying to fit a distribution to some data points (actually, I want to test which distribution is best to model the data). Moreover, I want to estimate the distributions parameters. I am sure that the stats module has many functions to help me with this, but I am not sure I understand how to use them. As a test, I create some RVs using say y=stats.distributions.norm.rvs ( size=100). I can then find the mean and standard deviation using (mu,sigma)=stats.distributions.norm.fit(y) which works fine. However, some methods do not work. For example, the pdf(self,x..) method returns an array of 0s, and similarly for the cdf() method. However, the _pdf() and _cdf() methods give the desired result. Is this a bug? Am I suppossed to use the underscore methods or the public methods? Also, is there some example of how to use kstest? it might be related, but if I try to test the previous data, I get the following error: stats.kstest(y,'norm',args=(mu,sigma)) --------------------------------------------------------------------------- exceptions.TypeError Traceback (most recent call last) /home/ggjlgd/<ipython console> /usr/lib/python2.4/site-packages/scipy/stats/stats.py in kstest(rvs, cdf, args, N) 1721 # D = max(D1,D2) 1722 D = D1 -> 1723 return D, distributions.ksone.sf(D,N) 1724 1725 def chisquare(f_obs, f_exp=None): /usr/lib/python2.4/site-packages/scipy/stats/distributions.py in sf(self, x, *args, **kwds) 521 output = zeros(shape(cond),'d') 522 insert(output,(1-cond0)*(cond1==cond1),self.badvalue) --> 523 insert(output,cond2,1.0) 524 goodargs = argsreduce(cond, *((x,)+args)) 525 insert(output,cond,self._sf(*goodargs)) /usr/lib/python2.4/site-packages/numpy/lib/function_base.py in insert(arr, obj, values, axis) 1190 1191 obj = asarray(obj, dtype=intp) -> 1192 numnew = len(obj) 1193 index1 = obj + arange(numnew) 1194 index2 = setdiff1d(arange(numnew+N),index1) TypeError: len() of unsized object Am I doing something wrong here? Is it a bug? My versions are Scipy: 0.5.0.2178 Numpy: 1.0b5 and I run on Kubuntu Dapper on Linux. I use Andrew Straw's packages. Thanks Jose -- "Feel free" – 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
Jose Luis Gomez Dans wrote:
Hi, I am trying to fit a distribution to some data points (actually, I want to test which distribution is best to model the data). Moreover, I want to estimate the distributions parameters. I am sure that the stats module has many functions to help me with this, but I am not sure I understand how to use them.
As a test, I create some RVs using say y=stats.distributions.norm.rvs ( size=100). I can then find the mean and standard deviation using (mu,sigma)=stats.distributions.norm.fit(y) which works fine. However, some methods do not work. For example, the pdf(self,x..) method returns an array of 0s, and similarly for the cdf() method. Are you using these functions correctly? They seem to work for me. For example, you don't pass in "self" as the first argument.
Here is a usage x = r_[-10:10:100j] y = stats.distribution.norm.pdf(x, loc=0.3, scale=2.0)
Also, is there some example of how to use kstest? it might be related, but if I try to test the previous data, I get the following error: stats.kstest(y,'norm',args=(mu,sigma))
Hmm.. This works for me. y=stats.distributions.norm.rvs(size=100) mu,sigma = stats.distributions.norm.fit(y) stats.kstest(y,'norm',args=(mu,sigma)) I am running scipy 0.5.1, but I don't know if that should matter in this case. -Travis
Hi Travis, Many thanks for your quick reply! After reading some other messages it dawned on me that the problem was the Numpy version. I had 1.0b5, and this caused the problems. Using 1.0b2 works fine now. Many thanks again! José -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
participants (2)
-
Jose Luis Gomez Dans -
Travis Oliphant