[Numpy-discussion] extracting a random subset of a vector

Tue Aug 31 12:49:06 EDT 2004

On Tue, 31 Aug 2004, Curzio Basso wrote:

> Hi all, I have an optimization problem.
>
> I currently use the following code to select a random subset of a rank-1
> array:

Here's a slightly faster version.  It's about 3x faster than Chris Barker's
version (4x faster than your original version) for N=1000000, M=100:

import numarray as NA
import numarray.random_array as RA
from math import sqrt

N = 1000000
M = 100
full = NA.arange(N)

r = RA.random(N)
thresh = (M+3*sqrt(M))/N
subset = NA.compress(r<thresh, full)
while len(subset) < M:
    # rarely executed
    thresh = thresh+3*sqrt(M)/N
    subset = NA.compress(r<thresh, full)
subset = subset[RA.permutation(len(subset))[:M]]

By the way, I also find that most of the time gets spent in the
permutation computation.  That's why this is faster -- it gets do a
smaller permutation.
					Rick