Random from Dictionary

Sun Oct 21 18:17:28 EDT 2001

Martin von Loewis <loewis at informatik.hu-berlin.de> writes:

> "Tim Peters" <tim.one at home.com> writes:
> 
> > Season to taste.  In 2.2 you could use a more memory-efficient scheme via
> > iterating over d directly:
> > 
> > def random_from_dict(d, x):
> >     "Return and remove x random (k, v) pairs from d."
> >     from random import random
> >     n = float(len(d))  # float() so x/n later doesn't truncate to 0
> >     if not 1 <= x <= n:
> >         raise ValueError("go away")
> >     result = []
> >     for k in d:
> >         # Choose this item with probability x/n.
> >         if random() <= x/n:
> >             result.append((k, d[k]))
> >             x -= 1
> >             if x == 0:
> >                 break
> >         n -= 1
> >     for k, v in result:
> >         del d[k]
> >     return result
> 
> Does that give each key in the dictionary the same probability of
> being picked (assuming random() distributes uniformly)? If yes, is
> there some easy way to "see" that this is the case? If no, could one
> design an algorithm that iterates over a dictionary only once, still
> offers the same probability to each key?

See "Programming Pearls", by Jon Bentley, for discussion of this
algorithm.  Basically though look at the case x=1 and convince
yourself that you'll pick one element from the dictionary with
uniform probability.  Then generalize to x>1.