[Python-bugs-list] [Bug #130030] Claim of bad betavariate algorithm

Thu, 25 Jan 2001 22:34:26 -0800

Bug #130030, was updated on 2001-Jan-25 03:03
Here is a current snapshot of the bug.

Project: Python
Category: Python Library
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: tim_one
Assigned to : tim_one
Summary: Claim of bad betavariate algorithm

Details: From c.l.py.  Beats me, but sounds credible.  random.py cites
Discrete Event Simulation in C, pp 87-88, for its algorithm.

"""
[Janne Sinkkonen (mailto:janne@mansikka.nnets.fi)]

At least in Python 2.0 and earlier, the samples returned by the function
betavariate() of random.py are not from a beta distribution although the
function name misleadingly suggests so.

The following would give beta-distributed samples:

def betavariate(alpha, beta):
     y = gammavariate(alpha,1)
     if y==0: return 0.0
     else: return  y/(y+gammavariate(beta,1))

This is from matlab. A comment in the original matlab code refers to
Devroye, L. (1986) Non-Uniform Random Variate Generation, theorem 4.1A (p.
430). Another reference would be Gelman, A. et al. (1995) Bayesian data
analysis, p. 481, which I have checked and found to agree with the code
above.
"""

Follow-Ups:

Date: 2001-Jan-25 22:34
By: tim_one

Comment:
And my reply:

[Ivan Frohne]
> I'm convinced that Janne Sinkkonen is right: The beta distribution 
> generator in module random.py does not return Beta-distributed random 
> numbers.  Janne's suggested fix should work just fine.

Which is (for reference):

def betavariate(alpha, beta):
    y = gammavariate(alpha,1)
    if y==0: return 0.0
    else: return  y/(y+gammavariate(beta,1))

> Here's my guess on how and why this bug bit -- it won't be of interest
> to  most but this subject is so obscure sometimes that there needs to
> be a detailed analysis.
> ...

Which I'll skip, since it's immortalized already in c.l.py and the bug
report:

https://sourceforge.net/bugs/?func=detailbug&bug_id=130030&group_id=5470

> ...
> The corrected equation for a beta random variable, using Watkins' 
> definition of the gamma density, is
>
>     brv(A, B) = grv(A, 1) / ( grv(A, 1) + grv(1/B, 1) ),
>
> which translates to
>
>     brv(A, B) = grv(A, 1) / (grv(A, 1) + grv(B, 1)
>
> using the more common gamma density definition (the one used in random.
> py).

Now that seems plain wrong, although I hope it's just typographical
confusions:

1. You don't really mean to generate two independent instances of grv(A,
1), right?

2. The parens around the summand denominator have magically disappeared,
changing something of the form X/(X+Y) to (X/X)+Y.

> Many standard statistical references give this equation -- two are 
> "Non-Uniform random Variate Generation," by Luc Devroye, Springer-Verlag,

> 1986, p. 432, and "Monte Carlo Concepts, Algorithms and Applications,"
> by George S. Fishman, Springer, 1996, p. 200.

So does Knuth, Vol 3 Ed 3 (in the X/(X+Y) form), although he's not careful
(as was Janne) to avoid division by 0.  So I'll check in Janne's code
as-is.

Thank you for the analysis!  It was an enlightening help.

-------------------------------------------------------

Date: 2001-Jan-25 22:19
By: tim_one

Comment:
c.l.py post by Ivan Frohne:

I'm convinced that Janne Sinkkonen is right: The beta distribution
generator in module random.py does not return Beta-distributed random
numbers.  Janne's suggested fix should work just fine.

Here's my guess on how and why this bug bit -- it won't be of interest to
most but this subject is so obscure sometimes that there needs to be a
detailed analysis.

The probability density function of the gamma distribution with (positive)
parameters A and B is usually written

    g(x; A, B) = (x**(A-1) * exp(x/B)) / (Gamma(A) * B**A), where x, A, and
B > 0.

Here Gamma(A) is the gamma function -- for A a positive integer, Gamma(A)
is the factorial of A - 1, Gamma(A) = (A-1)!.  In fact, this is the
definition used by the authors of random.py in defining gammavariate(alpha,
beta), the gamma distribution random number generator.

Now it happens that a gamma-distributed random variable with parameters A =
1 and B has the (much simpler) exponential distribution with density
function

    g(x; 1, B) = exp(-x/B) / B.

Keep that in mind.

The reference "Discrete Event Simulation in ," by Kevin Watkins
(McGraw-Hill, 1993) was consulted by the random.py authors.  But this
reference defines the gamma probability distribution a little differently,
as

    g1(x; A, B) = (B**A * x**(A-1) * exp(B*x)) / Gamma(A), where x, A, B >
0.

(See p. 85).  On page 87, Watkins states (incorrectly) that if grv(A, B) is
a function which returns a gamma random variable with parameters A and B
(using his definition on p. 85), then the function

    brv(A, B) = grv(1, 1/B) / ( grv(1, 1/B) + grv(1, A) ) [ not true!]

will return a random variable which has the beta distribution with
parameters A and B.

Believing Watkins to be correct, the random.py authors remembered that a
gamma random variable with parameter A = 1 is just an exponential random
variable and further simplified their beta generator to

    brv(A, B) = erv(1/B) / (erv(1/B) + erv(A)), where erv(K) is a random
variable

having the exponential distribution with parameter K.

The corrected equation for a beta random variable, using Watkins'
definition of the gamma density, is

    brv(A, B) = grv(A, 1) / ( grv(A, 1) + grv(1/B, 1) ),

which translates to

    brv(A, B) = grv(A, 1) / (grv(A, 1) + grv(B, 1)

using the more common gamma density definition (the one used in random.py).
Many standard statistical references give this equation -- two are
"Non-Uniform random Variate Generation," by Luc Devroye, Springer-Verlag,
1986, p. 432, and "Monte Carlo Concepts, Algorithms and Applications," by
George S. Fishman, Springer, 1996, p. 200.

--Ivan Frohne

-------------------------------------------------------

Date: 2001-Jan-25 13:36
By: tim_one

Comment:
I have no idea why the Group was set to Irreproducible, but have seen that
on other recent new bug reports too -- maybe a new SF buglet.

Anyway, changed group to None.  Ivan Frohne helpfully investigated this,
and made what looks to be a very strong case for adopting the algorithm
Janne suggests.

-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=130030&group_id=5470