How do I sample randomly based on some probability(wightage)?

Scott David Daniels Scott.Daniels at Acm.Org
Sun May 31 16:21:48 CEST 2009


Sumitava Mukherjee wrote:
> On May 27, 8:08 pm, Scott David Daniels <Scott.Dani... at Acm.Org> wrote:
>> Sumitava Mukherjee wrote:
>>> I need to randomly sample from a list where all choices have weights
>>> attached to them. The probability of them being choosen is dependent
>>> on the weights.
>> I am not sure why everybody is normalizing by dividing the weights.
>> This isa possibility (I had fun writing it).
>>
>> def sample_without_replacement(choices, weights):
>> ...  combined = [(w, v) for w, v in zip(weights, choices) if w > 0]
>> ...  total = sum(w for w, v in combined) # sum(weights) also works
>>      while combined:
>> ...
>>          yield choice
>>          total -= weight
>>          if weight > total * 256: # arbitrary choice for recalculation
>>              # precision affected, rebuild
>>              total = sum(w for w, v in combined)
>>          del combined[n]
>>      raise ValueError('Samplng more than %d without replacement?' % (
>>                        sum(1 for w in weights if w > 0)))
>> ...
Should be:
            total -= weight
            yield choice
            del combined[n]
            if weight > total * 256: # arbitrary choice for recalculation
                 # precision affected, rebuild
                 total = sum(w for w, v in combined)
        raise ValueError('Samplng more than %d without replacement?' % (
                                       sum(1 for w in weights)))
There were two mistakes, both related to the definition of combined:
  (1) Combined only holds positive values, so testing in sum is silly.
  (2) Total is meant to track the sum of weights in combined, and the
      update was misplaced with respect to the delete.

> Among all the help (all of which I really appreciate), I found your
> solution closest to what I was expecting. Thanks a lot Scott.

You are welcome (presuming I didn't simply do your homework).  I only
noticed the problems above while scanning your reply.  The failures I
noticed should have been caught with good tests or a pair programming
partner.  Of course, in this environment, I didn't have either high
quality tests or a partner (I did simple hand testing and eyeball
analysis).

--Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-list mailing list