[Tutor] Weighted Random Choice - Anyone have an efficient algorithm?

Thu Dec 23 12:00:08 CET 2010

On 01/-10/-28163 02:59 PM, Modulok wrote:
> Does anyone know of an efficient way of doing a weighted random
> choice? (I don't even know what algorithms like this would be called.)
> Preferably, something that doesn't grow exponentially with the number
> of elements in the list, or the size of their respective values.
>
> For example: Assume I have a list of integer elements. I want to pick
> an index from that list, using the value of said elements to control
> the probability of selecting a given index:
>
> w = [5, 20, 75]
> wrand(w)
>
> Where wrand() will return '2', (the index of 75),  about 75% of the time.
> It would return '1', (the index of 20) about 20% of the time, and 0,
> about 5% of the time. It could return the actual value, but returning
> the index seems more flexible.
>
> The implementation doesn't have to work exactly like this, even if the
> weight values don't add up to 100, or are arbitrary, that's fine.
> Hopefully you get the idea. Here's what I tried (it works, but is
> slow):
>
> ### Begin Code Example ###
> import random
> random.seed(2) #<-- So we can reproduce the sequence for testing.
>
> def wrandom(weights):
>      '''
>      Return a randomly selected index of the 'weights' list, using the
>      values of that list to affect probability of said index being returned.
>      '''
>      debug = False
>      flattened = []
>      for i, w in enumerate(weights):
>          if debug: print "i, w: ", i, w
>          for k in range(w):
>              flattened.append(i)
>      if debug: print "flattened: ", flattened
>      rnd = random.randint(0, (len(flattened) - 1))
>      return flattened[rnd]
>
> # Not test it:
> print wrandom([5, 20, 75])
> print wrandom([5, 20, 75])
> print wrandom([5, 20, 75])
>
> ### End Code Example ###
>
> It works and is easy enough to understand, but when the number of list
> items gets large, or the weights get heavy, things get ugly.
> -Modulok-
>
You're building a temporary list, then discarding it when the function 
returns.  Your speed efficiency is likely to improve if you keep that 
"flattened list" and only do the conversion once.

But the separate problem of that flattened list possibly being much 
larger than necessary is more interesting.  What you probably want to 
build instead is a list of accumulated probabilities.  It would be the 
same size as the input list, and in this case, it would contain 5, 25, 
and 100, respectively.  The list is always monotonic, since none of your 
probabilities can legally be negative.

Anyway, once you have that new list, you pick a random number in the 
range from 0 to newlist[-1], and do a binary search of the newlist for 
the first value > your random value.

And if the list is small, you can just scan through it linearly, which 
is much less code.

DaveA