How do I sample randomly based on some probability(wightage)?
Sumitava Mukherjee
smukh at cognobytes.com
Sun May 31 09:08:44 CEST 2009
On May 27, 8:08 pm, Scott David Daniels <Scott.Dani... at Acm.Org> wrote:
> Sumitava Mukherjee wrote:
> > I need to randomly sample from a list where all choices have weights
> > attached to them. The probability of them being choosen is dependent
> > on the weights.
>
> I am not sure why everybody is normalizing by dividing the weights.
> This isa possibility (I had fun writing it).
>
> def sample_without_replacement(choices, weights):
> '''Yield elements sampled w/o replacement by weighting'''
> if len(weights) != len(choices):
> raise ValueError('%d choices, but %d weights?' % (
> len(choices), len(weights)))
> if min(weights) < 0:
> raise ValueError('Negative weights?: %s' % (
> [(i, w) for i, w in enumerate(weights) if w < 0]))
>
> # look at only non-zero probabilities
> combined = [(w, v) for w, v in zip(weights, choices) if w > 0]
>
> # Go from highest probability down to reduce expected traversal
> combined.sort(key=operator.itemgetter(0), reverse=True)
>
> total = sum(w for w, v in combined) # sum(weights) also works
> while combined:
> spot = sample = random.random() * total
> for n, (weight, choice) in enumerate(combined):
> spot -= weight
> if spot <= 0:
> break
> else:
> # n, (weight, choice) = 0, combined[0] # Highest probability
> raise ValueError('%f left after choosing %f/%f?: %s' % (
> spot, sample, total, combined))
> yield choice
> total -= weight
> if weight > total * 256: # arbitrary choice for recalculation
> # precision affected, rebuild
> total = sum(w for w, v in combined)
> del combined[n]
> raise ValueError('Samplng more than %d without replacement?' % (
> sum(1 for w in weights if w > 0)))
>
> for n in range(10):
> gen = sample_without_replacement('abcdef', [32,16,8,4,2,1])
> print gen.next(), gen.next(), gen.next()
>
> --Scott David Daniels
> Scott.Dani... at Acm.Org
Among all the help (all of which I really appreciate), I found your
solution closest to what I was expecting. Thanks a lot Scott.
More information about the Python-list
mailing list