Sampling a population

Robert Kern robert.kern at
Fri Jun 2 22:03:16 CEST 2006

Brian Quinlan wrote:
> The fastest algorithm that I have been able to devise for doing so is: 
> O(n * log(len(lst))). Can anyone think or a solution with a better time 
> complexity? If not, is there an obviously better way to do this 
> (assuming n is big and the list size is small).

numpy.searchsorted() can do all of the n lookups in C.

# Untested.

import numpy as np

def draw(n, lst):
    ps = np.cumsum([0.0] + [x[1] for x in lst])
    # watch for floating point errors here. It's likely that the last
    # item won't be quite 1.0

    r = np.random.random(n)  # n psuedorandom numbers uniform on [0, 1)
    indices = np.searchsorted(ps, r)

    # now do what you like with indices, which is an array of indices into
    # lst.

Ed Schofield has an implementation of an algorithm by Marsaglia[1] which turns
sampling into a fast table lookup. If your probabilities have limited precision
(2**-30 or so rather than the full double precision 2**-52 or so), then this
might be an attractive option.


The implementation is in the scipy sandbox currently, but I don't think it
builds painlessly at the moment.

Ask on one of the scipy mailing lists if you need help building it.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

More information about the Python-list mailing list