How do I sample randomly based on some probability(wightage)?
Dave Angel
davea at ieee.org
Tue May 26 16:41:35 EDT 2009
Sumitava Mukherjee wrote:
> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>
> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> if I draw 10 samples, 9 should be H and 1 should be T.
>
> I coudn't find a function in the module random that does so.
> Please can someone guide me how the above could be implemented [either
> through some function which exists and I don't know or pointers to
> some code snippets which does so]?
>
>
Emile gave you a pretty elegant solution, assuming two things: The
choices can be represented by a single character each, and a roundoff
error of one part in a thousand is acceptable.
Of course, you can represent 256 choices in a 2.x character, and you
could switch to Unicode if that's not enough. And if 1000 isn't enough,
you could make it bigger, at the cost of needing a bigger table.
Here's my more-straightforward algorithm, not reduced to code.
Start with a list of probabilities. Replace each item with the sum of
all the items up to and including itself. (If you do it in-place, you'd
need to do it in reverse order, but if you use a copy, you could simply
replace each item with a sum() of the appropriate slice of the copy.
Anyway, now you've got a list of cumulative probabilites, with the last
item equalling 1.0 (pretty close). You might need to fudge that to
exactly 1.0, just in case.
I think I'd zip that list with the original list of items, so that you
have a single list of tuples.
Now to generate a random sample, call random.random() to get a value
between 0 and 1. Search the list for the first item greater than or
equal to your value, and return the other half of the tuple.
More information about the Python-list
mailing list