<div dir="ltr"><div><div><div><div><div><div>Hi, I'm looking for a way to find a random sample of C different items out of N items, with a some desired probabilty Pi for each item i.<br><br></div>I saw that numpy has a function that supposedly does this, numpy.random.choice (with replace=False and a probabilities array), but looking at the algorithm actually implemented, I am wondering in what sense are the probabilities Pi actually obeyed...<br><br>To me, the code doesn't seem to be doing the right thing... Let me explain:<br><br>Consider a simple numerical example: We have 3 items, and need to pick 2 different ones randomly. Let's assume the desired probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4.<br></div><br></div>Working out the equations there is exactly one solution here: The random outcome of numpy.random.choice in this case should be [1,2] at probability 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed a solution for the desired probabilities because it yields item 1 in [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] = 0.2+0.6 = 0.8 = 2*P2, etc.<br><br></div>However, the algorithm in numpy.random.choice's replace=False generates, if I understand correctly, different probabilities for the outcomes: I believe in this case it generates [1,2] at probability 0.23333, [1,3] also 0.2333, and [2,3] at probability 0.53333.<br><br></div>My question is how does this result fit the desired probabilities?<br><br></div>If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333, then the expect number of "1" results we'll get per drawing is 0.23333 + 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and for "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice common than item 1 as we originally desired (we asked for probabilities 0.2, 0.4, 0.4 for the individual items!).<br><br><br><div><div><div><div><div><div><div><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><span><div>--<br>Nadav Har'El<br></div><a href="mailto:nyh@scylladb.com" target="_blank">nyh@scylladb.com</a></span></div></div></div></div></div></div></div>
</div></div></div></div></div></div></div></div>