Sampling a population
Brian Quinlan
brian at sweetapp.com
Fri Jun 2 17:54:59 CEST 2006
This is less a Python question and more a optimization/probability
question. Imaging that you have a list of objects and there frequency in
a population e.g.
lst = [(a, 0.01), (b, 0.05), (c, 0.50), (d, 0.30), (e, 0.04), (f, 0.10)]
and you want to drawn n items from that list (duplicates allowed), with
that probability distribution.
The fastest algorithm that I have been able to devise for doing so is:
O(n * log(len(lst))). Can anyone think or a solution with a better time
complexity? If not, is there an obviously better way to do this
(assuming n is big and the list size is small).
Here is the code:
from random import random
from bisect import bisect
def draw(n, lst):
ps = []
last = 0
for p in lst:
ps.append(last + p)
last += p
# ps = [0.01, 0.06, 0.56, 0.86, 0.90, 1.00]
chosen = [0] * len(lst) # track frequency
for i in range(n):
r = random()
chosen[bisect(ps, r)] += 1 # binary search and increment
result = [] # rescale probability based on frequency
for c in chosen:
result.append(float(c) / n)
return result
lst = [0.01, 0.05, 0.50, 0.30, 0.04, 0.10]
print draw(10000, lst)
More information about the Python-list
mailing list