[SciPy-User] vectorized version of 'multinomial' sampling function

Wed Oct 14 02:19:29 EDT 2009

On Wed, Oct 14, 2009 at 1:41 AM,  <josef.pktd at gmail.com> wrote:
> On Wed, Oct 14, 2009 at 1:07 AM,  <josef.pktd at gmail.com> wrote:
>> On Tue, Oct 13, 2009 at 5:01 PM, per freem <perfreem at gmail.com> wrote:
>>> hi all,
>>>
>>> i have a series of probability vector that i'd like to feed into
>>> multinomial to get an array of vector outcomes back. for example,
>>> given:
>>>
>>> p = array([[ 0.9 ,  0.05,  0.05],
>>>       [ 0.05,  0.05,  0.9 ]])
>>>
>>> i'd like to call multinomial like this:
>>>
>>> multinomial(1, p)
>>
>> If you only want n=1, then it seems to me, you could do this also with
>> drawing an integer out of range(3) with the given probabilities. This
>> can be vectorized easily directly with numpy. (just like an individual
>> observation in multinomial logit)
>>
>> Josef
>>
>>>
>>> to get a vector of multinomial samplers, each using the nth list in
>>> 'p'. something like:
>>>
>>> array([[1, 0, 0], [0, 0 1]]) in this case. is this possible? it seems
>>> like 'multinomial' takes only a one dimensional array. i could write
>>> this as a "for" loop of course but i prefer a vectorized version since
>>> speed is crucial for me here.
>
>
> for n=1 the following should work (unless I still cannot tell
> multinomial distribution and multinomial logit apart)
>
> Josef
>
>>>> p = np.array([[ 0.9 ,  0.05,  0.05],
>       [ 0.05,  0.05,  0.9 ]])
>>>> pcum = p.cumsum(1)
>>>> pcuma = np.repeat(pcum,20, 0)
>>>> rvs = np.random.uniform(size=(40))
>>>> a,b,c = pcuma.T
>>>> mnrvs = np.column_stack((rvs<=a, (rvs>a) & (rvs<=b), rvs>b)).astype(int)
>>>> mnrvs[:20].sum()
> 20
>>>> mnrvs[:20].sum(0)
> array([19,  0,  1])
>>>> mnrvs[20:].sum()
> 20
>>>> mnrvs[20:].sum(0)
> array([ 1,  0, 19])
>>>> mnrvs[15:25]
> array([[0, 0, 1],
>       [1, 0, 0],
>       [1, 0, 0],
>       [1, 0, 0],
>       [1, 0, 0],
>       [0, 0, 1],
>       [0, 0, 1],
>       [0, 0, 1],
>       [1, 0, 0],
>       [0, 0, 1]])
>
>
> check for larger sample:
>
>>>> pcuma = np.repeat(pcum,500, 0)
>>>> rvs = np.random.uniform(size=(1000))
>>>> a,b,c = pcuma.T
>>>> mnrvs = np.column_stack((rvs<=a, (rvs>a) & (rvs<=b), rvs>b)).astype(int)
>>>> mnrvs[:500].sum(0)
> array([456,  17,  27])
>>>> mnrvs[500:].sum(0)
> array([ 24,  35, 441])
>>>> p*500
> array([[ 450.,   25.,   25.],
>       [  25.,   25.,  450.]])
>

similar works for e.g. n=5

Josef

>>> mnn = mnrvs.reshape(-1,5,3).sum(1)
>>> mnn[95:105]
array([[4, 0, 1],
       [4, 0, 1],
       [5, 0, 0],
       [4, 0, 1],
       [4, 1, 0],
       [0, 0, 5],
       [1, 0, 4],
       [1, 0, 4],
       [0, 0, 5],
       [0, 0, 5]])

>>> mnn[:100].sum(0)
array([456,  17,  27])
>>> mnn[100:].sum(0)
array([ 24,  35, 441])

>>> np.bincount(mnn[:100,0])
array([ 0,  0,  1,  8, 25, 66])
>>> np.bincount(mnn[:100,1])
array([85, 13,  2])
>>> np.bincount(mnn[:100,2])
array([78, 18,  3,  1])


compare with np.random

>>> rvsmn5 = np.random.multinomial(5,p[0],size=10000)
>>> rvsmn5.sum(0)/100.
array([ 450.92,   24.23,   24.85])

>>> np.bincount(rvsmn5[:,0])
array([   0,    3,   75,  740, 3191, 5991])
>>> np.bincount(rvsmn5[:,1])/100.
array([ 77.98,  19.89,   2.05,   0.08])
>>> np.bincount(rvsmn5[:,2])/100.
array([ 77.5 ,  20.28,   2.09,   0.13])