[SciPy-User] How to fit parameters of beta distribution?
Christoph Deil
deil.christoph at googlemail.com
Fri Jun 24 07:20:55 EDT 2011
On Jun 24, 2011, at 11:26 AM, John Reid wrote:
> Hi,
>
> I can see a instancemethod scipy.stats.beta.fit. I can't work out from
> the docs how to use it. From trial & error I got the following:
>
> In [12]: scipy.stats.beta.fit([.5])
> Out[12]:
> array([ 1.87795851e+00, 1.81444871e-01, 2.39026963e-04,
> 4.99760973e-01])
>
> What are the 4 values output by the method?
>
> Thanks,
> John.
Hi John,
the short answer is (a, b, loc, scale), but you probably want to fix loc=0 and scale=1 to get meaningful a, b estimates.
It takes some time to learn how scipy.stats.rv_continuous works, but this is a good starting point:
http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#distributions
There you'll see that every rv_continuous distribution (e.g. norm, chi2, beta) has two parameters loc and scale,
which shift and stretch the distribution like this:
(x - loc) / scale
E.g. from the docstring of scipy.stats.norm, you can see that norm uses these two parameters and has no extra "shape parameters":
Normal distribution
The location (loc) keyword specifies the mean.
The scale (scale) keyword specifies the standard deviation.
normal.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
You can draw a random data sample and fit it like this:
data = scipy.stats.norm.rvs(loc=10, scale=2, size=100)
scipy.stats.norm.fit(data) # returns loc, scale
# (9.9734277669649689, 2.2125503785545551)
The beta distribution you are interested in has two shape parameters a and b, plus in addition the loc and scale parameters every rv_continuous has:
Beta distribution
beta.pdf(x, a, b) = gamma(a+b)/(gamma(a)*gamma(b)) * x**(a-1) * (1-x)**(b-1)
for 0 < x < 1, a, b > 0.
In your case you probably want to fix loc=0 and scale=1 and only fit the a and b parameter, which you can do like this:
data = scipy.stats.beta.rvs(2, 5, size=100) # a = 2, b = 5 (can't use keyword arguments)
scipy.stats.beta.fit(data, floc=0, fscale=1) # returns a, b, loc, scale
# (2.6928363303187393, 5.9855671734557454, 0, 1)
I find that the splitting of parameters into "location and scale" and "shape" makes rv_continuous usage complicated:
- it is uncommon that the beta or chi2 or many other distributions have a loc and scale parameter
- the auto-generated docstrings are confusing at first
But if you look at the implementation it does avoid some repetitive code for the developers.
Btw., I don't know how you can fit multiple parameters to only one measurement [.5] in your example.
You must have executed some code before that line, otherwise you'll get a bunch of RuntimeWarnings and a different return value from the one you give (I use on scipy 0.9)
In [1]: import scipy.stats
In [2]: scipy.stats.beta.fit([.5])
Out[2]: (1.0, 1.0, 0.5, 0.0)
Christoph
More information about the SciPy-User
mailing list