[Numpy-discussion] Question about numpy.random.choice with probabilties

Wed Jan 18 03:35:30 EST 2017

On Wed, Jan 18, 2017 at 1:58 AM, alebarde at gmail.com <alebarde at gmail.com>
wrote:

>
>
> 2017-01-17 22:13 GMT+01:00 Nadav Har'El <nyh at scylladb.com>:
>
>>
>> On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com <alebarde at gmail.com>
>> wrote:
>>
>>> Hi Nadav,
>>>
>>> I may be wrong, but I think that the result of the current
>>> implementation is actually the expected one.
>>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
>>> 0.4
>>>
>>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>>>
>>
>> Yes, this formula does fit well with the actual algorithm in the code.
>> But, my question is *why* we want this formula to be correct:
>>
>> Just a note: this formula is correct and it is one of statistics
> fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability +
> https://en.wikipedia.org/wiki/Bayes%27_theorem
>

Hi,

Yes, of course the formula is correct, but it doesn't mean we're not
applying it in the wrong context.

I'll be honest here: I came to numpy.random.choice after I actually coded a
similar algorithm (with the same results) myself, because like you I
thought this was the "obvious" and correct algorithm. Only then I realized
that its output doesn't actually produce the desired probabilities
specified by the user - even in the cases where that is possible. And I
started wondering if existing libraries - like numpy - do this differently.
And it turns out, numpy does it (basically) in the same way as my algorithm.

>
> Thus, the result we get from random.choice IMHO definitely makes sense.
>

Let's look at what the user asked this function, and what it returns:

User asks: please give me random pairs of the three items, where item 1 has
probability 0.2, item 2 has 0.4, and 3 has 0.4.

Function returns: random pairs, where if you make many random returned
results (as in the law of large numbers) and look at the items they
contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
0.38333.
These are not (quite) the probabilities the user asked for...

Can you explain a sense where the user's requested probabilities (0.2, 0.4,
0.4) are actually adhered in the results which random.choice returns?

Thanks,
Nadav Har'El.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/c7a7a106/attachment.html>