[Numpy-discussion] Question about numpy.random.choice with probabilties

Nadav Har'El nyh at scylladb.com
Mon Jan 23 11:08:18 EST 2017


On Mon, Jan 23, 2017 at 5:47 PM, Robert Kern <robert.kern at gmail.com> wrote:

>
> > As for the standardness of the definition: I don't know, have you a
> reference where it is defined? More natural to me would be to have a list
> of items with integer multiplicities (as in: "cat" 3 times, "dog" 1 time).
> I'm hesitant to claim ours is a standard definition unless it's in a
> textbook somewhere. But I don't insist on my phrasing.
>
> Textbook, I'm not so sure, but it is the *only* definition I've ever
> encountered in the literature:
>
> http://epubs.siam.org/doi/abs/10.1137/0209009
>

Very interesting. This paper (PDF available if you search for its name in
Google) explicitly mentions one of the uses of this algorithm is
"multistage sampling", which appears to be exactly the same thing as in the
hypothetical Gulliver example I gave in my earlier mail.

And yet, I showed in my mail that this algorithm does NOT reproduce the
desired frequency of the different sampling units...

Moreover, this paper doesn't explain why you need the "without replacement"
for this use case (everything seems easier, and the desired probabilities
are reproduced, with replacement).
In my story I gave a funny excuse why "without replacement" might be
warrented, but if you're interested I can tell you a bit about my actual
use case, with a more serious reason why I want without replacement.


> http://www.sciencedirect.com/science/article/pii/S002001900500298X
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/6f387eb0/attachment.html>


More information about the NumPy-Discussion mailing list