# Most pythonic way of weighted random selection

duncan smith buzzard at urubu.freeserve.co.uk
Sun Aug 31 02:05:41 CEST 2008

Manuel Ebert wrote:
> Dear list,
>
> who's got aesthetic advice for the following problem? I've got some
> joint probabilities of two distinct events Pr(X=x, Y=y), stored in a
> list of lists of floats, where every row represents a possible outcome
> of X and every float in a row a possible outcome of Y (which I will now
> call my matrix, altough I can't use numpy or numeric for this), so e.g.
> m = [[0.2, 0.4, 0.05], [0.1, 0.05, 0.2]]. All lists in the list are
> equally long and the values of the flattened list add up to 1.0, i.e.
> sum([sum(row) for row in m]) == 1. In practice, this 'matrix' is about
> 20x40, i.e. a list with 20 lists á 40 floats each.
>
> Now the task is to select one outcome for X and Y based on the joint
> probabilites, and afterwards check that the outcomes fullfill certain
> criteria. If it doesn't fulfill some criteria a new pair of outcomes has
> to be selected, for other criteria it will still be selected with a
> certain probability. My approach was to choose a random number, and then
> walk through the list adding the values together until the accumulated
> sum is greater than my random threshold:
>

[snip]

For a list of cumulative probabilities you could use the bisect module
to find the insertion point for a 0,1 uniform variate (which you could
then map back to a cell index).

Alternatively you could use an alias table,

You could produce a probability table that takes into account your
criteria, so that you don't need to check them.  i.e. Each cell that
does not satisfy criterion a has probability 0 of being selected.  Each
other cell has a probability of being selected proportional to its
original value if it satisfies criterion b, or its original value * p
otherwise.  Normalise the resulting values to sum to 1 and you're done
with the need to check against criteria.

So, preprocess your table, create a corresponding list of cumulative
probabilities and use the bisect model (is one possibility).

Duncan