[Tutor] Weighted Random Choice - Anyone have an efficient algorithm?
Dave Angel
davea at ieee.org
Thu Dec 23 12:00:08 CET 2010
On 01/-10/-28163 02:59 PM, Modulok wrote:
> Does anyone know of an efficient way of doing a weighted random
> choice? (I don't even know what algorithms like this would be called.)
> Preferably, something that doesn't grow exponentially with the number
> of elements in the list, or the size of their respective values.
>
> For example: Assume I have a list of integer elements. I want to pick
> an index from that list, using the value of said elements to control
> the probability of selecting a given index:
>
> w = [5, 20, 75]
> wrand(w)
>
> Where wrand() will return '2', (the index of 75), about 75% of the time.
> It would return '1', (the index of 20) about 20% of the time, and 0,
> about 5% of the time. It could return the actual value, but returning
> the index seems more flexible.
>
> The implementation doesn't have to work exactly like this, even if the
> weight values don't add up to 100, or are arbitrary, that's fine.
> Hopefully you get the idea. Here's what I tried (it works, but is
> slow):
>
> ### Begin Code Example ###
> import random
> random.seed(2) #<-- So we can reproduce the sequence for testing.
>
> def wrandom(weights):
> '''
> Return a randomly selected index of the 'weights' list, using the
> values of that list to affect probability of said index being returned.
> '''
> debug = False
> flattened = []
> for i, w in enumerate(weights):
> if debug: print "i, w: ", i, w
> for k in range(w):
> flattened.append(i)
> if debug: print "flattened: ", flattened
> rnd = random.randint(0, (len(flattened) - 1))
> return flattened[rnd]
>
> # Not test it:
> print wrandom([5, 20, 75])
> print wrandom([5, 20, 75])
> print wrandom([5, 20, 75])
>
> ### End Code Example ###
>
> It works and is easy enough to understand, but when the number of list
> items gets large, or the weights get heavy, things get ugly.
> -Modulok-
>
You're building a temporary list, then discarding it when the function
returns. Your speed efficiency is likely to improve if you keep that
"flattened list" and only do the conversion once.
But the separate problem of that flattened list possibly being much
larger than necessary is more interesting. What you probably want to
build instead is a list of accumulated probabilities. It would be the
same size as the input list, and in this case, it would contain 5, 25,
and 100, respectively. The list is always monotonic, since none of your
probabilities can legally be negative.
Anyway, once you have that new list, you pick a random number in the
range from 0 to newlist[-1], and do a binary search of the newlist for
the first value > your random value.
And if the list is small, you can just scan through it linearly, which
is much less code.
DaveA
More information about the Tutor
mailing list