[Numpy-discussion] Question about numpy.random.choice with probabilties

Wed Jan 18 08:53:24 EST 2017

On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El <nyh at scylladb.com> wrote:

>
> On Wed, Jan 18, 2017 at 11:00 AM, alebarde at gmail.com <alebarde at gmail.com>
> wrote:
>
>> Let's look at what the user asked this function, and what it returns:
>>
>>>
>>> User asks: please give me random pairs of the three items, where item 1
>>> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>>>
>>> Function returns: random pairs, where if you make many random returned
>>> results (as in the law of large numbers) and look at the items they
>>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
>>> 0.38333.
>>> These are not (quite) the probabilities the user asked for...
>>>
>>> Can you explain a sense where the user's requested probabilities (0.2,
>>> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>>>
>>
>> I think that the question the user is asking by specifying p is a
>> slightly different one:
>>      "please give me random pairs of the three items extracted from a
>> population of 3 items where item 1 has probability of being extracted of
>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
>> extracted."
>>
>
> You are right, if that is what the user wants, numpy.random.choice does
> the right thing.
>
> I'm just wondering whether this is actually what users want, and whether
> they understand this is what they are getting.
>
> As I said, I expected it to generate pairs with, empirically, the desired
> distribution of individual items. The documentation of numpy.random.choice
> seemed to me (wrongly) that it implis that that's what it does. So I was
> surprised to realize that it does not.
>

As Alessandro and you showed, the function returns something that makes
sense. If the user wants something different, then they need to look for a
different function, which is however difficult if it doesn't have a
solution in general.

Sounds to me a bit like a Monty Hall problem. Whether we like it or not, or
find it counter intuitive, it is what it is given the sampling scheme.

Having more sampling schemes would be useful, but it's not possible to
implement sampling schemes with impossible properties

Josef

>
> Nadav.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/bb5967e1/attachment.html>