I posted this in the thread about indexing dict items but it seems to have been buried there so I'm starting a new thread. Maybe there could be a function in the random module for making a random choice from an arbitrary (finite) iterable. This implementation can choose a random element from an iterable by fully iterating over it so is O(N) in terms of CPU operations but O(1) for memory usage: # rnd.py from random import randrange def choice_iter(iterable): iterator = iter(iterable) try: value = next(iterator) except StopIteration: raise ValueError("Empty iterable") for n, candidate in enumerate(iterator, 2): if not randrange(n): value = candidate return value You could use that to get a choice from a dict, set etc:
choice_iter({'q', 'w', 'e'}) 'e' choice_iter({'q', 'w', 'e'}) 'q' choice_iter({'q', 'w', 'e'}) 'e' choice_iter({'q', 'w', 'e'}) 'w'
You can make a random choice from the keys, values or items of a dict. For a large dict/set it will be slow compared to converting to a list because it's calling randrange once per item. It can probably be made a lot faster by doing something like calling randbits and extracting the bits for multiple iterations of the loop. Although choice_iter is slower than choice it works for arbitrary iterables and has constant memory requirements when used with a lazy iterator. There are situations in which you would not want to build a list and use random.choice and where the computational overhead is insignificant. For example choice_iter can be used to randomly choose a line from an arbitrarily large text file without loading it entirely into memory:
from rnd import choice_iter with open('rnd.py') as fin: ... line = choice_iter(fin) ... line ' except StopIteration:\n'
When reading from a large text file this is roughly optimal in performance terms because it does precisely the minimum amount of IO (one pass) while having constant memory overhead. Of course if you wanted to select multiple lines from the file then calling choice_iter repeatedly is immediately suboptimal because it would read the file more than once. It's not hard though to generalise the function above to something that e.g. makes a selection of k values from an iterable in a single pass. Here is a version that works without replacement: def sample_iter(population, k): iterator = iter(population) values = [] for _ in range(k): try: value = next(iterator) except StopIteration: raise ValueError("Too few items") values.append(value) for n, candidate in enumerate(iterator, k+1): random_index = randrange(n) if random_index < k: values[random_index] = candidate return values # maybe shuffle first? The memory cost of this is O(k) regardless of the size of the input iterable. -- Oscar