# implement random selection in Python

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sat Nov 17 04:13:43 CET 2007

```On Fri, 16 Nov 2007 16:47:16 -0800, Bruza wrote:

> I think I need to explain on the probability part: the "prob" is a
> relative likelihood that the object will be included in the output list.
> So, in my example input of
>
>   items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
>
> So, for any size of N, 'Tom' (with prob of 45) will be more likely to be
> included in the output list of N distinct member than 'Mary' (prob of
> 30) and much more likely than that of 'John' (with prob of 10).

Since there are only four items, you can't select more than four distinct
items, because the fifth has to be a duplicate of one of the others. And
the probability/likelihood doesn't work either: if you insist on getting
distinct items, then your results are are much limited:

N = 0, result: []

N = 1, four possible results:
['Mary'] ['John'] ['Tom'] or ['Jane']

N = 2, six possible results (assuming order doesn't matter):
['Mary', 'John'] ['Mary', 'Tom'] ['Mary', 'Jane'] ['John', 'Tom']
['John', 'Jane'] or ['Tom', 'Jane']

N = 3, four possible results:
['Mary', 'John', 'Tom'] ['Mary', 'John', 'Jane']
['Mary', 'Tom', 'Jane'] or ['John', 'Tom', 'Jane']

N = 4, one possible result:
['Mary', 'John', 'Tom', 'Jane']

> I know "prob" is not exactly the "probability" in the context of
> returning a multiple member list. But what I want is a way to "favor"
> some member in a selection process.

I don't think this is really well defined, but I'll take a stab in the
dark at it.

Let's take the example above for N = 3:

items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
# use the lowest common factor for simplicity
items = [('Mary',6), ('John', 2), ('Tom', 9), ('Jane', 3)]

# N = 3 has four possible results:
R = ['Mary', 'John', 'Tom']
S = ['Mary', 'John', 'Jane']
T = ['Mary', 'Tom', 'Jane']
U = ['John', 'Tom', 'Jane']

You want to bias the four possible results above so that having Mary is
three times more likely than having John. It sounds simple, but it isn't.
To do so, you need to solve a whole series of simultaneous equations:

Pr(Mary) = Pr(R) + Pr(S) + Pr(T)
Pr(John) = Pr(R) + Pr(S) + Pr(U)
Pr(Mary) = 3*Pr(John)

And so on for all the other items.

This is a HARD problem to solve, and for most sets of likelihoods you
might give, there won't be a solution. For example, you can't have a set
of results where Mary is three times more likely than John, John is twice
as likely as Tom, and Tom is four times more likely than Mary. It simply
can't happen.

So we can't interpret the numbers as probabilities. We can't even
interpret them more loosely as "likelihoods". Proof of that is to
consider the case of N=4. There is only one possible result with four
distinct items. So all of Mary, John, Tom and Jane are equally likely, in
fact all of them are certain. The weightings you gave (30, 10, 45, 15)
are meaningless.

So, given that what you are asking for is impossible, can you explain
what you are actually trying to accomplish? Maybe there's a more feasible
alternative.

--
Steven.

```