numpy.random.logseries - incorrect convergence for k=1, k=2
random numbers generated by numpy.random.logseries do not converge to theoretical distribution: for probability paramater pr = 0.8, the random number generator converges to a frequency for k=1 at 39.8 %, while the theoretical probability mass is 49.71 k=2 is oversampled, other k's look ok check frequency of k=1 and k=2 at N = 1000000 0.398406 0.296465 pmf at k = 1 and k=2 with formula [ 0.4971 0.1988] for probability paramater pr = 0.3, the results are not as bad, but still off: frequency for k=1 at 82.6 %, while the theoretical probability mass is 84.11 check frequency of k=1 and k=2 at N = 1000000 0.826006 0.141244 pmf at k = 1 and k=2 with formula [ 0.8411 0.1262] below is a quick script for checking this Josef {{{ import numpy as np from scipy import stats pr = 0.8 np.set_printoptions(precision=2, suppress=True) # calculation for N=1million takes some time for N in [1000, 10000, 10000, 1000000]: rvsn=np.random.logseries(pr,size=N) fr=stats.itemfreq(rvsn) pmfs=stats.logser.pmf(fr[:,0],pr)*100 print 'log series sample frequency and pmf (in %) with N = ', N print np.column_stack((fr[:,0],fr[:,1]*100.0/N,pmfs)) np.set_printoptions(precision=4, suppress=True) print 'check frequency of k=1 and k=2 at N = ', N print np.sum(rvsn==1)/float(N), print np.sum(rvsn==2)/float(N) k = np.array([1,2]) print 'pmf at k = 1 and k=2 with formula' print -pr**k * 1.0 / k / np.log(1-pr) }}}
Filed as http://scipy.org/scipy/numpy/ticket/923
and I think i finally tracked down the source of the incorrect random
numbers, a reversed inequality in
http://scipy.org/scipy/numpy/browser/trunk/numpy/random/mtrand/distributions...
line 871, see my last comment to the trac ticket.
Josef
On Sep 27, 2:12 pm, joep
random numbers generated by numpy.random.logseries do not converge to theoretical distribution:
for probability paramater pr = 0.8, the random number generator converges to a frequency for k=1 at 39.8 %, while the theoretical probability mass is 49.71 k=2 is oversampled, other k's look ok
check frequency of k=1 and k=2 at N = 1000000 0.398406 0.296465 pmf at k = 1 and k=2 with formula [ 0.4971 0.1988]
for probability paramater pr = 0.3, the results are not as bad, but still off: frequency for k=1 at 82.6 %, while the theoretical probability mass is 84.11
check frequency of k=1 and k=2 at N = 1000000 0.826006 0.141244 pmf at k = 1 and k=2 with formula [ 0.8411 0.1262]
below is a quick script for checking this
Josef
{{{ import numpy as np from scipy import stats
pr = 0.8 np.set_printoptions(precision=2, suppress=True)
# calculation for N=1million takes some time for N in [1000, 10000, 10000, 1000000]: rvsn=np.random.logseries(pr,size=N) fr=stats.itemfreq(rvsn) pmfs=stats.logser.pmf(fr[:,0],pr)*100 print 'log series sample frequency and pmf (in %) with N = ', N print np.column_stack((fr[:,0],fr[:,1]*100.0/N,pmfs))
np.set_printoptions(precision=4, suppress=True)
print 'check frequency of k=1 and k=2 at N = ', N print np.sum(rvsn==1)/float(N), print np.sum(rvsn==2)/float(N)
k = np.array([1,2]) print 'pmf at k = 1 and k=2 with formula' print -pr**k * 1.0 / k / np.log(1-pr)}}}
_______________________________________________ Numpy-discussion mailing list Numpy-discuss...@scipy.orghttp://projects.scipy.org/mailman/listinfo/numpy-discussion
participants (1)
-
joep