Hypergeometric distribution
Raven
balckraven at gmail.com
Mon Jan 2 06:35:33 EST 2006
Well, what to say? I am very happy for all the solutions you guys have
posted :-)
For Paul:
I would prefer not to use Stirling's approximation
The problem with long integers is that to calculate the hypergeometric
I need to do float division and multiplication because integer division
returns 0. A solution could be to calculate log(Long_Factorial_Integer)
and finally calculate the hypergeometric with the logarithmic values.
I've done a test: iterated 1000 times two different functions for the
hypergeometric, the first one based on scipy.special.gammaln:
from scipy.special import gammaln
def lnchoose(n, m):
nf = gammaln(n + 1)
mf = gammaln(m + 1)
nmmnf = gammaln(n - m + 1)
return nf - (mf + nmmnf)
def hypergeometric_gamma(k, n1, n2, t):
if t > n1 + n2:
t = n1 + n2
if k > n1 or k > t:
return 0
elif t > n2 and ((k + n2) < t):
return 0
else:
c1 = lnchoose(n1,k)
c2 = lnchoose(n2, t - k)
c3 = lnchoose(n1 + n2 ,t)
return exp(c1 + c2 - c3)
and the second one based on the code by Steven and Scott:
import time
from math import log, exp
def bincoeff1(n, r):
if r < n - r:
r = n - r
x = 1
for i in range(n, r, -1):
x *= i
for i in range(n - r, 1, -1):
x /= i
return x
def hypergeometric(k, n1, n2, t):
if t > n1 + n2:
t = n1 + n2
if k > n1 or k > t:
return 0
elif t > n2 and ((k + n2) < t):
return 0
else:
c1 = log(raw_bincoeff1(n1,k))
c2 = log(raw_bincoeff1(n2, t - k))
c3 = log(raw_bincoeff1(n1 + n2 ,t))
return exp(c1 + c2 - c3)
def main():
t = time.time()
for i in range(1000):
r = hypergeometric(6,6,30,6)
print time.time() - t
t = time.time()
for i in range(1000):
r = hypergeometric_gamma(6,6,30,6)
print time.time() - t
if __name__ == "__main__":
main()
and the result is:
0.0386447906494
0.192448139191
The first approach is faster so I think I will adopt it.
More information about the Python-list
mailing list