[Python-ideas] random.choice on non-sequence

Tue Apr 12 17:40:32 EDT 2016

It surprised me a bit the first time I realised that random.choice did 
not work on a set.  (One expects "everything" to "just work" in Python! 
:-) )
There is a simple workaround (convert the set to a tuple or list before 
passing it) but ...

On 2011-06-23, Sven Marnach posted 'A few suggestions for the random 
module' on Python-Ideas, one of which was to allow random.choice to work 
on an arbitrary iterable.
Raymond Hettinger commented
     "It could be generalized to arbitrary iterables (Bentley provides 
an example of how to do this) but it is fragile (i.e. falls apart badly 
with weak random number generators) and doesn't correspond well with 
real use cases."
Sven replied
     "Again, there definitely are real world use cases ..."

I would like to revive this suggestion.  I don't understand Raymond's 
comment.  It seems to me that a reasonable implementation is better and 
more friendly than none, particularly for newbies.
This is the source code for random.choice in Python 3.2 (as far as I 
know it hasn't changed since):

     def choice(self, seq):
         """Choose a random element from a non-empty sequence."""
         try:
             i = self._randbelow(len(seq))
         except ValueError:
             raise IndexError('Cannot choose from an empty sequence')
         return seq[i]

And here is my proposed (tested) version:

     def choice(self, it):
         """Choose a random element from a non-empty iterable."""
         try:
             it[0]
         except IndexError:
             raise IndexError('Cannot choose from an empty sequence')
         except TypeError:
             it = tuple(it)
         try:
             i = self._randbelow(len(it))
         except ValueError:
             raise IndexError('Cannot choose from an empty iterable')
         return it[i]

This works on (e.g.) a list/tuple/string, a set, a dictionary view or a 
generator.
Obviously the generator has to be 'fresh' (i.e. any previously consumed 
values will be excluded from the choice) and 'throw-away' (random.choice 
will exhaust it).
But it means you can write code like
     random.choice(x for x in xrange(10) if x%3)    # this feels quite 
"Pythonic" to me!

(I experimented with versions that, when passed an object that supported 
len() but not indexing, viz. a set, iterated through the object as far 
as necessary instead of converting it to a list.  But I found that this 
was slower in practice as well as more complicated.  There might be a 
memory issue with huge iterables, but we are no worse off using existing 
workarounds for those.)

Downsides of proposed version:
     (1) Backward compatibility.  Could mask an error if a "wrong" 
object, but one that happens to be an iterable, is passed to random.choice.
           There is also slightly different behaviour if a dictionary 
(D) is passed (an unusual thing to do):
                 The current version will get a random integer in the 
range [0, len(D)), try to use that as a key of D,
                    and raise KeyError if that fails.
                 The proposed version behaves similarly except that it 
will always raise "KeyError: 0" if 0 is not a key of D.
                 One could argue that this is an improvement, if it 
matters at all (error reproducibility).
                 And code that deliberately exploits the existing 
behaviour, provided that it uses a dictionary whose keys are consecutive 
integers from 0 upwards, e.g.
                     D = { 0 : 'zero', 1 : 'one', 2 : 'two', 3 : 
'three', 4 : 'four' }
                     d = random.choice(D)    # equivalent, in this 
example, to "d = random.choice(list(D.values()))"
                     if d == 'zero':
                         ...
                 will continue to work (although one might argue that it 
deserves not to).
     (2) Speed - the proposed version is almost certainly slightly 
slower when called with a sequence.
          For what it's worth, I tried to measure the difference on my 
several-years-old Windows PC, but is was hard to measure accurately, 
even over millions of iterations.
          All I can say is that it appeared to be a small fraction of a 
millisecond, possibly in the region of 50 nanoseconds, per call.

Best wishes
Rob Cliffe