Random from Dictionary

Sun Oct 21 08:21:04 EDT 2001

"Tim Peters" <tim.one at home.com> writes:

> Season to taste.  In 2.2 you could use a more memory-efficient scheme via
> iterating over d directly:
> 
> def random_from_dict(d, x):
>     "Return and remove x random (k, v) pairs from d."
>     from random import random
>     n = float(len(d))  # float() so x/n later doesn't truncate to 0
>     if not 1 <= x <= n:
>         raise ValueError("go away")
>     result = []
>     for k in d:
>         # Choose this item with probability x/n.
>         if random() <= x/n:
>             result.append((k, d[k]))
>             x -= 1
>             if x == 0:
>                 break
>         n -= 1
>     for k, v in result:
>         del d[k]
>     return result

Does that give each key in the dictionary the same probability of
being picked (assuming random() distributes uniformly)? If yes, is
there some easy way to "see" that this is the case? If no, could one
design an algorithm that iterates over a dictionary only once, still
offers the same probability to each key?

Regards,
Martin