Sampling a population

Brian Quinlan brian at sweetapp.com
Fri Jun 2 17:54:59 CEST 2006


This is less a Python question and more a optimization/probability 
question. Imaging that you have a list of objects and there frequency in 
a population e.g.

lst = [(a, 0.01), (b, 0.05), (c, 0.50), (d, 0.30), (e, 0.04), (f, 0.10)]

and you want to drawn n items from that list (duplicates allowed), with 
that probability distribution.

The fastest algorithm that I have been able to devise for doing so is: 
O(n * log(len(lst))). Can anyone think or a solution with a better time 
complexity? If not, is there an obviously better way to do this 
(assuming n is big and the list size is small).

Here is the code:

from random import random
from bisect import bisect

def draw(n, lst):
     ps = []
     last = 0
     for p in lst:
         ps.append(last + p)
         last += p

     # ps = [0.01, 0.06, 0.56, 0.86, 0.90, 1.00]

     chosen = [0] * len(lst) # track frequency
     for i in range(n):
         r = random()

         chosen[bisect(ps, r)] += 1 # binary search and increment

     result = [] # rescale probability based on frequency
     for c in chosen:
         result.append(float(c) / n)
     return result

lst = [0.01, 0.05, 0.50, 0.30, 0.04, 0.10]
print draw(10000, lst)




More information about the Python-list mailing list