
On Jun 24, 2011, at 11:26 AM, John Reid wrote:
Hi,
I can see a instancemethod scipy.stats.beta.fit. I can't work out from the docs how to use it. From trial & error I got the following:
In [12]: scipy.stats.beta.fit([.5]) Out[12]: array([ 1.87795851e+00, 1.81444871e-01, 2.39026963e-04, 4.99760973e-01])
What are the 4 values output by the method?
Thanks, John.
Hi John, the short answer is (a, b, loc, scale), but you probably want to fix loc=0 and scale=1 to get meaningful a, b estimates. It takes some time to learn how scipy.stats.rv_continuous works, but this is a good starting point: http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#distributions There you'll see that every rv_continuous distribution (e.g. norm, chi2, beta) has two parameters loc and scale, which shift and stretch the distribution like this: (x - loc) / scale E.g. from the docstring of scipy.stats.norm, you can see that norm uses these two parameters and has no extra "shape parameters": Normal distribution The location (loc) keyword specifies the mean. The scale (scale) keyword specifies the standard deviation. normal.pdf(x) = exp(-x**2/2)/sqrt(2*pi) You can draw a random data sample and fit it like this: data = scipy.stats.norm.rvs(loc=10, scale=2, size=100) scipy.stats.norm.fit(data) # returns loc, scale # (9.9734277669649689, 2.2125503785545551) The beta distribution you are interested in has two shape parameters a and b, plus in addition the loc and scale parameters every rv_continuous has: Beta distribution beta.pdf(x, a, b) = gamma(a+b)/(gamma(a)*gamma(b)) * x**(a-1) * (1-x)**(b-1) for 0 < x < 1, a, b > 0. In your case you probably want to fix loc=0 and scale=1 and only fit the a and b parameter, which you can do like this: data = scipy.stats.beta.rvs(2, 5, size=100) # a = 2, b = 5 (can't use keyword arguments) scipy.stats.beta.fit(data, floc=0, fscale=1) # returns a, b, loc, scale # (2.6928363303187393, 5.9855671734557454, 0, 1) I find that the splitting of parameters into "location and scale" and "shape" makes rv_continuous usage complicated: - it is uncommon that the beta or chi2 or many other distributions have a loc and scale parameter - the auto-generated docstrings are confusing at first But if you look at the implementation it does avoid some repetitive code for the developers. Btw., I don't know how you can fit multiple parameters to only one measurement [.5] in your example. You must have executed some code before that line, otherwise you'll get a bunch of RuntimeWarnings and a different return value from the one you give (I use on scipy 0.9) In [1]: import scipy.stats In [2]: scipy.stats.beta.fit([.5]) Out[2]: (1.0, 1.0, 0.5, 0.0) Christoph