[SciPy-User] How to fit parameters of beta distribution?

Christoph Deil deil.christoph at googlemail.com
Fri Jun 24 07:20:55 EDT 2011


On Jun 24, 2011, at 11:26 AM, John Reid wrote:

> Hi,
> 
> I can see a instancemethod scipy.stats.beta.fit. I can't work out from 
> the docs how to use it. From trial & error I got the following:
> 
> In [12]: scipy.stats.beta.fit([.5])
> Out[12]:
> array([  1.87795851e+00,   1.81444871e-01,   2.39026963e-04,
>          4.99760973e-01])
> 
> What are the 4 values output by the method?
> 
> Thanks,
> John.

Hi John,

the short answer is (a, b, loc, scale), but you probably want to fix loc=0 and scale=1 to get meaningful a, b estimates.

It takes some time to learn how scipy.stats.rv_continuous works, but this is a good starting point:
http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#distributions

There you'll see that every rv_continuous distribution (e.g. norm, chi2, beta) has two parameters loc and scale,
which shift and stretch the distribution like this:
(x - loc) / scale

E.g. from the docstring of scipy.stats.norm, you can see that norm uses these two parameters and has no extra "shape parameters":
Normal distribution    
    The location (loc) keyword specifies the mean.
    The scale (scale) keyword specifies the standard deviation.
    normal.pdf(x) = exp(-x**2/2)/sqrt(2*pi)

You can draw a random data sample and fit it like this:
data = scipy.stats.norm.rvs(loc=10, scale=2, size=100)
scipy.stats.norm.fit(data) # returns loc, scale
# (9.9734277669649689, 2.2125503785545551)

The beta distribution you are interested in has two shape parameters a and b, plus in addition the loc and scale parameters every rv_continuous has:
Beta distribution    
    beta.pdf(x, a, b) = gamma(a+b)/(gamma(a)*gamma(b)) * x**(a-1) * (1-x)**(b-1)
    for 0 < x < 1, a, b > 0.

In your case you probably want to fix loc=0 and scale=1 and only fit the a and b parameter, which you can do like this:
data = scipy.stats.beta.rvs(2, 5, size=100) # a = 2, b = 5 (can't use keyword arguments)
scipy.stats.beta.fit(data, floc=0, fscale=1) # returns a, b, loc, scale
# (2.6928363303187393, 5.9855671734557454, 0, 1)

I find that the splitting of parameters into "location and scale" and "shape" makes rv_continuous usage complicated:
- it is uncommon that the beta or chi2 or many other distributions have a loc and scale parameter
- the auto-generated docstrings are confusing at first
But if you look at the implementation it does avoid some repetitive code for the developers.

Btw., I don't know how you can fit multiple parameters to only one measurement [.5] in your example.
You must have executed some code before that line, otherwise you'll get a bunch of RuntimeWarnings and a different return value from the one you give (I use on scipy 0.9)
In [1]: import scipy.stats
In [2]: scipy.stats.beta.fit([.5])
Out[2]: (1.0, 1.0, 0.5, 0.0)

Christoph




More information about the SciPy-User mailing list