implement random selection in Python
Neil Cerutti
horpner at yahoo.com
Mon Nov 19 08:31:39 EST 2007
On 2007-11-17, Bruza <benruza at gmail.com> wrote:
> OOPS. I pressed the Send too fast.
>
> The problem w/ Boris's solution is that after repeated calling
> of randomPick(3,items), 'Jane' is not the most "frequent
> appearing" member in all the out list of 3 member lists...
How does this solution fair against your spec?
def sample_bp(seq, probabilities, k):
""" Return k distinct random items from seq, with probabilities specified
as weighted integers in a list.
>>> random.seed(0)
>>> sample_bp(['a', 'b'], [1, 5], 2)
['b', 'a']
>>> sample_bp(['a', 'b', 'c'], [1, 5, 2], 3)
['b', 'a', 'c']
"""
if k > len(seq):
raise ValueError('sample size greater than population')
probs = build_probs(probabilities)
rv = []
while k > 0:
j = random_prob(probs)
rv.append(probs[j][2])
remove_prob(probs, j)
k -= 1
return [seq[i] for i in rv]
def build_probs(probabilities):
""" Receives a list of integers, and returns list of ranges and original
indices.
>>> build_probs([8, 10, 7])
[(0, 8, 0), (8, 18, 1), (18, 25, 2)]
>>> build_probs([1, 5, 8, 2, 3, 7])
[(0, 1, 0), (1, 6, 1), (6, 14, 2), (14, 16, 3), (16, 19, 4), (19, 26, 5)]
"""
k = 0
probs = []
for i, p in enumerate(probabilities):
if p < 0:
raise ValueError('negative probability')
probs.append((k, k+p, i))
k = k+p
return probs
def random_prob(probs):
""" Return the index of a weighted random element of prob.
>>> random.seed(0)
>>> for x in xrange(20):
... print random_prob(build_probs([1, 5, 8, 2, 3, 7])),
5 5 2 2 2 2 5 2 2 3 5 2 2 5 4 2 5 5 5 5
>>> random_prob([(0, 15, 0)])
0
"""
i = random.randrange(probs[-1][1])
# Binary search for the element whose range contains i
hi = len(probs)
lo = 0
while lo < hi:
mid = (lo+hi)//2
begin, end, _ = probs[mid]
if i >= begin and i < end: return mid
elif i >= end: lo = mid+1
else: hi = mid
def remove_prob(probs, i):
""" Remove element j from the probability list, adjusting ranges as needed.
>>> prob = [(0, 12, 0), (12, 15, 1), (15, 25, 2)]
>>> remove_prob(prob, 1)
>>> prob
[(0, 12, 0), (12, 22, 2)]
"""
begin, end, _ = probs[i]
diff = end - begin
j = i+1
while j < len(probs):
begin, end, index = probs[j]
probs[j] = (begin-diff, end-diff, index)
j += 1
del probs[i]
This was the most simple-minded approach I could think of, so it
might serve as a reference against a more efficient approach.
Although thorough testing of such a bizarre function boggles my
mind.
I toyed with sorting the probability list from greatest to
lowest, which would make a linear search fast, but unfortunately
it would slow down removing probabilities.
--
Neil Cerutti
I make love to pressure. --Stephen Jackson
More information about the Python-list
mailing list