Sampling a population
robert.kern at gmail.com
Fri Jun 2 22:03:16 CEST 2006
Brian Quinlan wrote:
> The fastest algorithm that I have been able to devise for doing so is:
> O(n * log(len(lst))). Can anyone think or a solution with a better time
> complexity? If not, is there an obviously better way to do this
> (assuming n is big and the list size is small).
numpy.searchsorted() can do all of the n lookups in C.
import numpy as np
def draw(n, lst):
ps = np.cumsum([0.0] + [x for x in lst])
# watch for floating point errors here. It's likely that the last
# item won't be quite 1.0
r = np.random.random(n) # n psuedorandom numbers uniform on [0, 1)
indices = np.searchsorted(ps, r)
# now do what you like with indices, which is an array of indices into
Ed Schofield has an implementation of an algorithm by Marsaglia which turns
sampling into a fast table lookup. If your probabilities have limited precision
(2**-30 or so rather than the full double precision 2**-52 or so), then this
might be an attractive option.
The implementation is in the scipy sandbox currently, but I don't think it
builds painlessly at the moment.
Ask on one of the scipy mailing lists if you need help building it.
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list