I think this is not possible to do efficiently with just numpy. If you want to do this efficiently, I wrote a no-replacement sampler in Cython some time ago (below). I hearby release it to the public domain.


Created on Oct 24, 2009
@author: johnsalvatier


from __future__ import division

import numpy 

def random_no_replace(sampleSize, populationSize, numSamples):


    samples  = numpy.zeros((numSamples, sampleSize),dtype=int)


    # Use Knuth's variable names

    cdef int n = sampleSize

    cdef int N = populationSize

    cdef i = 0

    cdef int t = 0 # total input records dealt with

    cdef int m = 0 # number of items selected so far

    cdef double u

    while i < numSamples:

        t = 0

        m = 0 

        while m < n :


            u = numpy.random.uniform() # call a uniform(0,1) random number generator

            if  (N - t)*u >= n - m :


                t += 1




                samples[i,m] = t

                t += 1

                m += 1


        i += 1


    return samples


On Mon, Dec 20, 2010 at 8:28 AM, Alan G Isaac <alan.isaac@gmail.com> wrote:
I want to sample *without* replacement from a vector
(as with Python's random.sample).  I don't see a direct
replacement for this, and I don't want to carry two
PRNG's around.  Is the best way something like  this?


Alan Isaac
NumPy-Discussion mailing list