[Numpy-discussion] Generating random samples without repeats

Rick White rlw at stsci.edu
Fri Sep 19 09:00:36 EDT 2008


Paul Moore <pf_moore <at> yahoo.co.uk> writes:

> Robert Kern <robert.kern <at> gmail.com> writes:
>> On Thu, Sep 18, 2008 at 16:55, Paul Moore <pf_moore <at>  
>> yahoo.co.uk> wrote:
>>> I want to generate a series of random samples, to do simulations  
>>> based
>>> on them. Essentially, I want to be able to produce a SAMPLESIZE * N
>>> matrix, where each row of N values consists of either
>>>
>>> 1. Integers between 1 and M (simulating M rolls of an N-sided  
>>> die), or
>>> 2. A sample of N numbers between 1 and M without repeats (simulating
>>>    deals of N cards from an M-card deck).
>>>
>>> Example (1) is easy, numpy.random.random_integers(1, M,  
>>> (SAMPLESIZE, N))
>>>
>>> But I can't find an obvious equivalent for (2). Am I missing  
>>> something
>>> glaringly obvious? I'm using numpy - is there maybe something in  
>>> scipy I
>>> should be looking at?
>>
>> numpy.array([(numpy.random.permutation(M) + 1)[:N]
>>     for i in range(SAMPLESIZE)])
>>
>
> Thanks.
>
> And yet, this takes over 70s and peaks at around 400M memory use,  
> whereas the
> equivalent for (1)
>
> numpy.random.random_integers(1,M,(SAMPLESIZE,N))
>
> takes less than half a second, and negligible working memory (both  
> end up
> allocating an array of the same size, but your suggestion consumes  
> temporary
> working memory - I suspect, but can't prove, that the time taken  
> comes from
> memory allocations rather than computation.

It seems like numpy.random.permutation is pretty suboptimal in its  
speed.  Here's a Python 1-liner that does the same thing (I think)  
but is a lot faster:

a = 1+numpy.random.rand(M).argsort()[0:N-1]

This still has the the problem that it generates a size N array to  
start with.  But at least it is fast compared with permutation:

 >>> t1 = timeit.Timer("a = 1+numpy.random.permutation(M)[0:N-1]",  
"import numpy; M=1000; N=100")
 >>> t1.repeat(3,1000)
[1.3792998790740967, 1.3387739658355713, 1.3446521759033203]
 >>> t2 = timeit.Timer("a = 1+numpy.random.rand(M).argsort()[0:N-1]",  
"import numpy; M=1000; N=100")
 >>> t2.repeat(3,1000)
[0.17597699165344238, 0.16013598442077637, 0.16066503524780273]





More information about the NumPy-Discussion mailing list