[SciPy-User] How to fit parameters of beta distribution?

Fri Jun 24 08:37:49 EDT 2011

Thanks for the information. Just out of interest, this is what I get on 
scipy 0.7 (no warnings)

In [1]: import scipy.stats

In [2]: scipy.stats.beta.fit([.5])
Out[2]:
array([  1.87795851e+00,   1.81444871e-01,   2.39026963e-04,
          4.99760973e-01])

In [3]: scipy.__version__
Out[3]: '0.7.0'

Also I have (following your advice):

In [7]: scipy.stats.beta.fit([.5], floc=0., fscale=1.)
Out[7]:
array([  1.87795851e+00,   1.81444871e-01,   2.39026963e-04,
          4.99760973e-01])

which just seems wrong, surely the loc and scale in the output should be 
what I specified in the arguments? In any case from your example, it 
seems like it is fixed in 0.9

I'm assuming fit() does a ML estimate of the parameters which I think is 
fine to do for a beta distribution and one data point.

Thanks,
John.

On 24/06/11 12:20, Christoph Deil wrote:
>
> On Jun 24, 2011, at 11:26 AM, John Reid wrote:
>
>> Hi,
>>
>> I can see a instancemethod scipy.stats.beta.fit. I can't work out from
>> the docs how to use it. From trial&  error I got the following:
>>
>> In [12]: scipy.stats.beta.fit([.5])
>> Out[12]:
>> array([  1.87795851e+00,   1.81444871e-01,   2.39026963e-04,
>>           4.99760973e-01])
>>
>> What are the 4 values output by the method?
>>
>> Thanks,
>> John.
>
> Hi John,
>
> the short answer is (a, b, loc, scale), but you probably want to fix loc=0 and scale=1 to get meaningful a, b estimates.
>
> It takes some time to learn how scipy.stats.rv_continuous works, but this is a good starting point:
> http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#distributions
>
> There you'll see that every rv_continuous distribution (e.g. norm, chi2, beta) has two parameters loc and scale,
> which shift and stretch the distribution like this:
> (x - loc) / scale
>
> E.g. from the docstring of scipy.stats.norm, you can see that norm uses these two parameters and has no extra "shape parameters":
> Normal distribution
>      The location (loc) keyword specifies the mean.
>      The scale (scale) keyword specifies the standard deviation.
>      normal.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
>
> You can draw a random data sample and fit it like this:
> data = scipy.stats.norm.rvs(loc=10, scale=2, size=100)
> scipy.stats.norm.fit(data) # returns loc, scale
> # (9.9734277669649689, 2.2125503785545551)
>
> The beta distribution you are interested in has two shape parameters a and b, plus in addition the loc and scale parameters every rv_continuous has:
> Beta distribution
>      beta.pdf(x, a, b) = gamma(a+b)/(gamma(a)*gamma(b)) * x**(a-1) * (1-x)**(b-1)
>      for 0<  x<  1, a, b>  0.
>
> In your case you probably want to fix loc=0 and scale=1 and only fit the a and b parameter, which you can do like this:
> data = scipy.stats.beta.rvs(2, 5, size=100) # a = 2, b = 5 (can't use keyword arguments)
> scipy.stats.beta.fit(data, floc=0, fscale=1) # returns a, b, loc, scale
> # (2.6928363303187393, 5.9855671734557454, 0, 1)
>
> I find that the splitting of parameters into "location and scale" and "shape" makes rv_continuous usage complicated:
> - it is uncommon that the beta or chi2 or many other distributions have a loc and scale parameter
> - the auto-generated docstrings are confusing at first
> But if you look at the implementation it does avoid some repetitive code for the developers.
>
> Btw., I don't know how you can fit multiple parameters to only one measurement [.5] in your example.
> You must have executed some code before that line, otherwise you'll get a bunch of RuntimeWarnings and a different return value from the one you give (I use on scipy 0.9)
> In [1]: import scipy.stats
> In [2]: scipy.stats.beta.fit([.5])
> Out[2]: (1.0, 1.0, 0.5, 0.0)
>
> Christoph