[Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement)

Wed Aug 31 15:07:01 EDT 2011

You can use:
1 + numpy.argmax(numpy.random.multinomial(1, [0.1, 0.2, 0.7]))

For your "real" application you'll probably want to use a value >1 for the
first parameter (equal to your sample size), instead of calling it multiple
times.

-=- Olivier

2011/8/31 Christopher Jordan-Squire <cjordan1 at uw.edu>

> In numpy, is there a way of generating a random integer in a specified
> range where the integers in that range have given probabilities? So,
> for example, generating a random integer between 1 and 3 with
> probabilities [0.1, 0.2, 0.7] for the three integers?
>
> I'd like to know how to do this without replacement, as well. If the
> probabilities are uniform, there are a number of ways, including just
> shuffling the data and taking the first however-many elements of the
> shuffle. But this doesn't apply with non-uniform probabilities.
> Similarly, one could try arbitrary-sampling-method X (such as
> inverse-cdf sampling) and then rejecting repeats. But that is clearly
> sub-optimal if the number of samples desired is near the same order of
> magnitude as the total population, or if the probabilities are very
> skewed. (E.g. a weighted sample of size 2 without replacement from
> [0,1,2] with probabilities [0.999,.00005, 0.00005] will take a long
> time if you just sample repeatedly until you have two distinct
> samples.)
>
> I know parts of what I want can be done in scipy.statistics using a
> discrete_rv or with the python standard library's random package. I
> would much prefer to do it only using numpy because the eventual
> application shouldn't have a scipy dependency and should use the same
> random seed as numpy.random.
>
> (For more background, what I want is to create a function like sample
> in R, where I can give it an array-like of doo-hickeys and another
> array-like of probabilities associated with each doo-hickey, and then
> generate a random sample of doo-hickeys with those probabilities. One
> step for that is generating ints, to use as indices, with the same
> probabilities. I'd like a version of this to be in numpy/scipy, but it
> doesn't really belong in scipy since it doesn't
>
> -Chris JS
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110831/b65cfa04/attachment.html>