[Numpy-discussion] Question about numpy.random.choice with probabilties

Mon Jan 23 09:52:56 EST 2017

2017-01-23 15:33 GMT+01:00 Robert Kern <robert.kern at gmail.com>:

> On Mon, Jan 23, 2017 at 6:27 AM, Anne Archibald <peridot.faceted at gmail.com>
> wrote:
> >
> > On Wed, Jan 18, 2017 at 4:13 PM Nadav Har'El <nyh at scylladb.com> wrote:
> >>
> >> On Wed, Jan 18, 2017 at 4:30 PM, <josef.pktd at gmail.com> wrote:
> >>>
> >>>> Having more sampling schemes would be useful, but it's not possible
> to implement sampling schemes with impossible properties.
> >>>
> >>> BTW: sampling 3 out of 3 without replacement is even worse
> >>>
> >>> No matter what sampling scheme and what selection probabilities we
> use, we always have every element with probability 1 in the sample.
> >>
> >> I agree. The random-sample function of the type I envisioned will be
> able to reproduce the desired probabilities in some cases (like the example
> I gave) but not in others. Because doing this correctly involves a set of n
> linear equations in comb(n,k) variables, it can have no solution, or many
> solutions, depending on the n and k, and the desired probabilities. A
> function of this sort could return an error if it can't achieve the desired
> probabilities.
> >
> > It seems to me that the basic problem here is that the
> numpy.random.choice docstring fails to explain what the function actually
> does when called with weights and without replacement. Clearly there are
> different expectations; I think numpy.random.choice chose one that is easy
> to explain and implement but not necessarily what everyone expects. So the
> docstring should be clarified. Perhaps a Notes section:
> >
> > When numpy.random.choice is called with replace=False and non-uniform
> probabilities, the resulting distribution of samples is not obvious.
> numpy.random.choice effectively follows the procedure: when choosing the
> kth element in a set, the probability of element i occurring is p[i]
> divided by the total probability of all not-yet-chosen (and therefore
> eligible) elements. This approach is always possible as long as the sample
> size is no larger than the population, but it means that the probability
> that element i occurs in the sample is not exactly p[i].
>
> I don't object to some Notes, but I would probably phrase it more like we
> are providing the standard definition of the jargon term "sampling without
> replacement" in the case of non-uniform probabilities. To my mind (or more
> accurately, with my background), "replace=False" obviously picks out the
> implemented procedure, and I would have been incredibly surprised if it did
> anything else. If the option were named "unique=True", then I would have
> needed some more documentation to let me know exactly how it was
> implemented.
>
> FWIW, I totally agree with Robert


>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
--------------------------------------------------------------------------
NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
confidential information and are intended for the sole use of the
recipient(s) named above. If you are not the intended recipient of this
message you are hereby notified that any dissemination or copying of this
message is strictly prohibited. If you have received this e-mail in error,
please notify the sender either by telephone or by e-mail and delete the
material from any computer. Thank you.
--------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/c2175bfb/attachment.html>