Hi Nadav,
I may be wrong, but I think that the result of the current implementation
is actually the expected one.
Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4
P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
Now, P([1]) = 0.2 and P([2]) = 0.4. However:
P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability)
P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that, once
normalised, translate into 1/3 and 2/3 respectively)
Therefore P([1,2]) = 0.7/3 = 0.23333
Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333
What am I missing?
Alessandro
2017-01-17 13:00 GMT+01:00 <numpy-discussion-request(a)scipy.org>:
> Hi, I'm looking for a way to find a random sample of C different items out
> of N items, with a some desired probabilty Pi for each item i.
>
> I saw that numpy has a function that supposedly does this,
> numpy.random.choice (with replace=False and a probabilities array), but
> looking at the algorithm actually implemented, I am wondering in what sense
> are the probabilities Pi actually obeyed...
>
> To me, the code doesn't seem to be doing the right thing... Let me explain:
>
> Consider a simple numerical example: We have 3 items, and need to pick 2
> different ones randomly. Let's assume the desired probabilities for item 1,
> 2 and 3 are: 0.2, 0.4 and 0.4.
>
> Working out the equations there is exactly one solution here: The random
> outcome of numpy.random.choice in this case should be [1,2] at probability
> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed
> a solution for the desired probabilities because it yields item 1 in
> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
> 0.2+0.6 = 0.8 = 2*P2, etc.
>
> However, the algorithm in numpy.random.choice's replace=False generates, if
> I understand correctly, different probabilities for the outcomes: I believe
> in this case it generates [1,2] at probability 0.23333, [1,3] also 0.2333,
> and [2,3] at probability 0.53333.
>
> My question is how does this result fit the desired probabilities?
>
> If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333,
> then the expect number of "1" results we'll get per drawing is 0.23333 +
> 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and for
> "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice
> common than item 1 as we originally desired (we asked for probabilities
> 0.2, 0.4, 0.4 for the individual items!).
>
>
> --
> Nadav Har'El
> nyh(a)scylladb.com
>