[SciPy-User] problem with discrete rvs?
josef.pktd at gmail.com
josef.pktd at gmail.com
Mon Aug 17 18:28:27 EDT 2009
On Mon, Aug 17, 2009 at 6:10 PM, <josef.pktd at gmail.com> wrote:
> On Mon, Aug 17, 2009 at 5:47 PM, Robin<robince at gmail.com> wrote:
>> Hi,
>>
>> If I understand the scipy.stats package I should be able to create my
>> own arbitrary discrete distribution and generate samples from it... I
>> am trying this but am having the following problem. I define my
>> discrete distribution with the following:
>>
>> vals = (array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
>> 14, 15, 16,
>> 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]),
>> array([ 0.31518555, 0.08191701, 0.0817977 , 0.02212394, 0.08149974,
>> 0.02204335, 0.02201125, 0.00595341, 0.0817977 , 0.02212394,
>> 0.02209172, 0.00597518, 0.02201125, 0.00595341, 0.00594474,
>> 0.00160788, 0.08161887, 0.02207557, 0.02204342, 0.00596212,
>> 0.02196313, 0.0059404 , 0.00593175, 0.00160437, 0.02204342,
>> 0.00596212, 0.00595343, 0.00161023, 0.00593175, 0.00160437,
>> 0.00160203, 0.0004333 ]))
>>
>> Then I do
>> rv = rv_discrete(name='test',values=vals)
>>
>> and then try to generate samples. When I plot the results of this for
>> large number of samples, it seems to match pretty well but the last 4
>> or 5 values are never generated and have zero probability, even with
>> enough samples to really capture them.
>>
>> In [742]: (te.rvs(size=1000000)==30).sum()
>> Out[742]: 0
>> In [743]: (te.rvs(size=1000000)==29).sum()
>> Out[743]: 0
>> In [744]: (te.rvs(size=1000000)==27).sum()
>> Out[744]: 0
>> In [745]: (te.rvs(size=1000000)==20).sum()
>> Out[745]: 21884
>>
>> Does anyone have any ideas? Is this a bug in rv_discrete.rvs?
>>
>> In [746]: scipy.__version__
>> Out[746]: '0.8.0.dev5825'
>> In [747]: numpy.__version__
>> Out[747]: '1.4.0.dev7039'
>
> just a quick check
>
>>>> vals[1].cumsum()
> array([ 0.31518555, 0.39710256, 0.47890026, 0.5010242 , 0.58252394,
> 0.60456729, 0.62657854, 0.63253195, 0.71432965, 0.73645359,
> 0.75854531, 0.76452049, 0.78653174, 0.79248515, 0.79842989,
> 0.80003777, 0.88165664, 0.90373221, 0.92577563, 0.93173775,
> 0.95370088, 0.95964128, 0.96557303, 0.9671774 , 0.98922082,
> 0.99518294, 1.00113637, 1.0027466 , 1.00867835, 1.01028272,
> 1.01188475, 1.01231805])
>
>
> your probabilities don't add up to one.
>
> I haven't checked if this is the source of the problem
>
> Josef
I don't know what your te is. but the following looks ok
Josef
import numpy as np
from scipy import stats
vals = [np.array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]),
np.array([ 0.31518555, 0.08191701, 0.0817977 , 0.02212394, 0.08149974,
0.02204335, 0.02201125, 0.00595341, 0.0817977 , 0.02212394,
0.02209172, 0.00597518, 0.02201125, 0.00595341, 0.00594474,
0.00160788, 0.08161887, 0.02207557, 0.02204342, 0.00596212,
0.02196313, 0.0059404 , 0.00593175, 0.00160437, 0.02204342,
0.00596212, 0.00595343, 0.00161023, 0.00593175, 0.00160437,
0.00160203, 0.0004333 ])]
vals[1] /= vals[1].sum()
te = stats.rv_discrete(name='test',values=vals)
nsample = 10000
xrvs =te.rvs(size=nsample)
for i in range(25,31):
print i, (xrvs==i).sum(), nsample * vals[1][i]
########## result
25 43 58.8957195814
26 56 58.8098769947
27 19 15.9063646055
28 62 58.5957150522
29 17 15.8484776598
30 11 15.8253623948
>
>>
>> Cheers
>>
>> Robin
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
More information about the SciPy-User
mailing list