[Numpy-discussion] Question about numpy.random.choice with probabilties

Nadav Har'El nyh at scylladb.com
Wed Jan 18 04:52:45 EST 2017


On Wed, Jan 18, 2017 at 11:00 AM, alebarde at gmail.com <alebarde at gmail.com>
wrote:

> Let's look at what the user asked this function, and what it returns:
>
>>
>> User asks: please give me random pairs of the three items, where item 1
>> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>>
>> Function returns: random pairs, where if you make many random returned
>> results (as in the law of large numbers) and look at the items they
>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
>> 0.38333.
>> These are not (quite) the probabilities the user asked for...
>>
>> Can you explain a sense where the user's requested probabilities (0.2,
>> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>>
>
> I think that the question the user is asking by specifying p is a slightly
> different one:
>      "please give me random pairs of the three items extracted from a
> population of 3 items where item 1 has probability of being extracted of
> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
> extracted."
>

You are right, if that is what the user wants, numpy.random.choice does the
right thing.

I'm just wondering whether this is actually what users want, and whether
they understand this is what they are getting.

As I said, I expected it to generate pairs with, empirically, the desired
distribution of individual items. The documentation of numpy.random.choice
seemed to me (wrongly) that it implis that that's what it does. So I was
surprised to realize that it does not.

Nadav.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/8c541b5f/attachment.html>


More information about the NumPy-Discussion mailing list