[SciPy-User] How to fit parameters of beta distribution?
John Reid
j.reid at mail.cryst.bbk.ac.uk
Fri Jun 24 08:37:49 EDT 2011
Thanks for the information. Just out of interest, this is what I get on
scipy 0.7 (no warnings)
In [1]: import scipy.stats
In [2]: scipy.stats.beta.fit([.5])
Out[2]:
array([ 1.87795851e+00, 1.81444871e-01, 2.39026963e-04,
4.99760973e-01])
In [3]: scipy.__version__
Out[3]: '0.7.0'
Also I have (following your advice):
In [7]: scipy.stats.beta.fit([.5], floc=0., fscale=1.)
Out[7]:
array([ 1.87795851e+00, 1.81444871e-01, 2.39026963e-04,
4.99760973e-01])
which just seems wrong, surely the loc and scale in the output should be
what I specified in the arguments? In any case from your example, it
seems like it is fixed in 0.9
I'm assuming fit() does a ML estimate of the parameters which I think is
fine to do for a beta distribution and one data point.
Thanks,
John.
On 24/06/11 12:20, Christoph Deil wrote:
>
> On Jun 24, 2011, at 11:26 AM, John Reid wrote:
>
>> Hi,
>>
>> I can see a instancemethod scipy.stats.beta.fit. I can't work out from
>> the docs how to use it. From trial& error I got the following:
>>
>> In [12]: scipy.stats.beta.fit([.5])
>> Out[12]:
>> array([ 1.87795851e+00, 1.81444871e-01, 2.39026963e-04,
>> 4.99760973e-01])
>>
>> What are the 4 values output by the method?
>>
>> Thanks,
>> John.
>
> Hi John,
>
> the short answer is (a, b, loc, scale), but you probably want to fix loc=0 and scale=1 to get meaningful a, b estimates.
>
> It takes some time to learn how scipy.stats.rv_continuous works, but this is a good starting point:
> http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#distributions
>
> There you'll see that every rv_continuous distribution (e.g. norm, chi2, beta) has two parameters loc and scale,
> which shift and stretch the distribution like this:
> (x - loc) / scale
>
> E.g. from the docstring of scipy.stats.norm, you can see that norm uses these two parameters and has no extra "shape parameters":
> Normal distribution
> The location (loc) keyword specifies the mean.
> The scale (scale) keyword specifies the standard deviation.
> normal.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
>
> You can draw a random data sample and fit it like this:
> data = scipy.stats.norm.rvs(loc=10, scale=2, size=100)
> scipy.stats.norm.fit(data) # returns loc, scale
> # (9.9734277669649689, 2.2125503785545551)
>
> The beta distribution you are interested in has two shape parameters a and b, plus in addition the loc and scale parameters every rv_continuous has:
> Beta distribution
> beta.pdf(x, a, b) = gamma(a+b)/(gamma(a)*gamma(b)) * x**(a-1) * (1-x)**(b-1)
> for 0< x< 1, a, b> 0.
>
> In your case you probably want to fix loc=0 and scale=1 and only fit the a and b parameter, which you can do like this:
> data = scipy.stats.beta.rvs(2, 5, size=100) # a = 2, b = 5 (can't use keyword arguments)
> scipy.stats.beta.fit(data, floc=0, fscale=1) # returns a, b, loc, scale
> # (2.6928363303187393, 5.9855671734557454, 0, 1)
>
> I find that the splitting of parameters into "location and scale" and "shape" makes rv_continuous usage complicated:
> - it is uncommon that the beta or chi2 or many other distributions have a loc and scale parameter
> - the auto-generated docstrings are confusing at first
> But if you look at the implementation it does avoid some repetitive code for the developers.
>
> Btw., I don't know how you can fit multiple parameters to only one measurement [.5] in your example.
> You must have executed some code before that line, otherwise you'll get a bunch of RuntimeWarnings and a different return value from the one you give (I use on scipy 0.9)
> In [1]: import scipy.stats
> In [2]: scipy.stats.beta.fit([.5])
> Out[2]: (1.0, 1.0, 0.5, 0.0)
>
> Christoph
More information about the SciPy-User
mailing list