[SciPy-User] problem with discrete rvs?

Mon Aug 17 18:28:27 EDT 2009

On Mon, Aug 17, 2009 at 6:10 PM, <josef.pktd at gmail.com> wrote:
> On Mon, Aug 17, 2009 at 5:47 PM, Robin<robince at gmail.com> wrote:
>> Hi,
>>
>> If I understand the scipy.stats package I should be able to create my
>> own arbitrary discrete distribution and generate samples from it... I
>> am trying this but am having the following problem. I define my
>> discrete distribution with the following:
>>
>> vals = (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13,
>> 14, 15, 16,
>>       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]),
>>  array([ 0.31518555,  0.08191701,  0.0817977 ,  0.02212394,  0.08149974,
>>        0.02204335,  0.02201125,  0.00595341,  0.0817977 ,  0.02212394,
>>        0.02209172,  0.00597518,  0.02201125,  0.00595341,  0.00594474,
>>        0.00160788,  0.08161887,  0.02207557,  0.02204342,  0.00596212,
>>        0.02196313,  0.0059404 ,  0.00593175,  0.00160437,  0.02204342,
>>        0.00596212,  0.00595343,  0.00161023,  0.00593175,  0.00160437,
>>        0.00160203,  0.0004333 ]))
>>
>> Then I do
>> rv = rv_discrete(name='test',values=vals)
>>
>> and then try to generate samples. When I plot the results of this for
>> large number of samples, it seems to match pretty well but the last 4
>> or 5 values are never generated and have zero probability, even with
>> enough samples to really capture them.
>>
>> In [742]: (te.rvs(size=1000000)==30).sum()
>> Out[742]: 0
>> In [743]: (te.rvs(size=1000000)==29).sum()
>> Out[743]: 0
>> In [744]: (te.rvs(size=1000000)==27).sum()
>> Out[744]: 0
>> In [745]: (te.rvs(size=1000000)==20).sum()
>> Out[745]: 21884
>>
>> Does anyone have any ideas? Is this a bug in rv_discrete.rvs?
>>
>> In [746]: scipy.__version__
>> Out[746]: '0.8.0.dev5825'
>> In [747]: numpy.__version__
>> Out[747]: '1.4.0.dev7039'
>
> just a quick check
>
>>>> vals[1].cumsum()
> array([ 0.31518555,  0.39710256,  0.47890026,  0.5010242 ,  0.58252394,
>        0.60456729,  0.62657854,  0.63253195,  0.71432965,  0.73645359,
>        0.75854531,  0.76452049,  0.78653174,  0.79248515,  0.79842989,
>        0.80003777,  0.88165664,  0.90373221,  0.92577563,  0.93173775,
>        0.95370088,  0.95964128,  0.96557303,  0.9671774 ,  0.98922082,
>        0.99518294,  1.00113637,  1.0027466 ,  1.00867835,  1.01028272,
>        1.01188475,  1.01231805])
>
>
> your probabilities don't add up to one.
>
> I haven't checked if this is the source of the problem
>
> Josef

I don't know what your te is. but the following looks ok

Josef

import numpy as np
from scipy import stats

vals = [np.array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13,
14, 15, 16,
      17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]),
 np.array([ 0.31518555,  0.08191701,  0.0817977 ,  0.02212394,  0.08149974,
       0.02204335,  0.02201125,  0.00595341,  0.0817977 ,  0.02212394,
       0.02209172,  0.00597518,  0.02201125,  0.00595341,  0.00594474,
       0.00160788,  0.08161887,  0.02207557,  0.02204342,  0.00596212,
       0.02196313,  0.0059404 ,  0.00593175,  0.00160437,  0.02204342,
       0.00596212,  0.00595343,  0.00161023,  0.00593175,  0.00160437,
       0.00160203,  0.0004333 ])]

vals[1] /= vals[1].sum()

te = stats.rv_discrete(name='test',values=vals)

nsample = 10000
xrvs =te.rvs(size=nsample)
for i in range(25,31):
    print i, (xrvs==i).sum(), nsample * vals[1][i]

########## result

25 43 58.8957195814
26 56 58.8098769947
27 19 15.9063646055
28 62 58.5957150522
29 17 15.8484776598
30 11 15.8253623948

>
>>
>> Cheers
>>
>> Robin
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>