[Numpy-discussion] Question about numpy.random.choice with probabilties

Wed Jan 18 09:30:48 EST 2017

On Wed, Jan 18, 2017 at 8:53 AM, <josef.pktd at gmail.com> wrote:

>
>
> On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El <nyh at scylladb.com> wrote:
>
>>
>> On Wed, Jan 18, 2017 at 11:00 AM, alebarde at gmail.com <alebarde at gmail.com>
>> wrote:
>>
>>> Let's look at what the user asked this function, and what it returns:
>>>
>>>>
>>>> User asks: please give me random pairs of the three items, where item 1
>>>> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>>>>
>>>> Function returns: random pairs, where if you make many random returned
>>>> results (as in the law of large numbers) and look at the items they
>>>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
>>>> 0.38333.
>>>> These are not (quite) the probabilities the user asked for...
>>>>
>>>> Can you explain a sense where the user's requested probabilities (0.2,
>>>> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>>>>
>>>
>>> I think that the question the user is asking by specifying p is a
>>> slightly different one:
>>>      "please give me random pairs of the three items extracted from a
>>> population of 3 items where item 1 has probability of being extracted of
>>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
>>> extracted."
>>>
>>
>> You are right, if that is what the user wants, numpy.random.choice does
>> the right thing.
>>
>> I'm just wondering whether this is actually what users want, and whether
>> they understand this is what they are getting.
>>
>> As I said, I expected it to generate pairs with, empirically, the desired
>> distribution of individual items. The documentation of numpy.random.choice
>> seemed to me (wrongly) that it implis that that's what it does. So I was
>> surprised to realize that it does not.
>>
>
> As Alessandro and you showed, the function returns something that makes
> sense. If the user wants something different, then they need to look for a
> different function, which is however difficult if it doesn't have a
> solution in general.
>
> Sounds to me a bit like a Monty Hall problem. Whether we like it or not,
> or find it counter intuitive, it is what it is given the sampling scheme.
>
> Having more sampling schemes would be useful, but it's not possible to
> implement sampling schemes with impossible properties.
>

BTW: sampling 3 out of 3 without replacement is even worse

No matter what sampling scheme and what selection probabilities we use, we
always have every element with probability 1 in the sample.

(Which in survey statistics implies that the sampling error or standard
deviation of any estimate of a population mean or total is zero. Which I
found weird. How can you do statistics and get an estimate that doesn't
have any uncertainty associated with it?)

Josef

>
> Josef
>
>
>
>>
>> Nadav.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/372e9542/attachment.html>