# Sampling a population

Robert Kern robert.kern at gmail.com
Fri Jun 2 22:03:16 CEST 2006

```Brian Quinlan wrote:
> The fastest algorithm that I have been able to devise for doing so is:
> O(n * log(len(lst))). Can anyone think or a solution with a better time
> complexity? If not, is there an obviously better way to do this
> (assuming n is big and the list size is small).

numpy.searchsorted() can do all of the n lookups in C.

# Untested.

import numpy as np

def draw(n, lst):
ps = np.cumsum([0.0] + [x[1] for x in lst])
# watch for floating point errors here. It's likely that the last
# item won't be quite 1.0

r = np.random.random(n)  # n psuedorandom numbers uniform on [0, 1)
indices = np.searchsorted(ps, r)

# now do what you like with indices, which is an array of indices into
# lst.

Ed Schofield has an implementation of an algorithm by Marsaglia[1] which turns
sampling into a fast table lookup. If your probabilities have limited precision
(2**-30 or so rather than the full double precision 2**-52 or so), then this
might be an attractive option.

[1]  http://www.jstatsoft.org/v11/i03/v11i03.pdf

The implementation is in the scipy sandbox currently, but I don't think it
builds painlessly at the moment.

http://svn.scipy.org/svn/scipy/trunk/Lib/sandbox/montecarlo/

Ask on one of the scipy mailing lists if you need help building it.

http://www.scipy.org/Mailing_Lists

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma