[Python-ideas] random.choice on non-sequence
Rob Cliffe
rob.cliffe at btinternet.com
Tue Apr 12 17:40:32 EDT 2016
It surprised me a bit the first time I realised that random.choice did
not work on a set. (One expects "everything" to "just work" in Python!
:-) )
There is a simple workaround (convert the set to a tuple or list before
passing it) but ...
On 2011-06-23, Sven Marnach posted 'A few suggestions for the random
module' on Python-Ideas, one of which was to allow random.choice to work
on an arbitrary iterable.
Raymond Hettinger commented
"It could be generalized to arbitrary iterables (Bentley provides
an example of how to do this) but it is fragile (i.e. falls apart badly
with weak random number generators) and doesn't correspond well with
real use cases."
Sven replied
"Again, there definitely are real world use cases ..."
I would like to revive this suggestion. I don't understand Raymond's
comment. It seems to me that a reasonable implementation is better and
more friendly than none, particularly for newbies.
This is the source code for random.choice in Python 3.2 (as far as I
know it hasn't changed since):
def choice(self, seq):
"""Choose a random element from a non-empty sequence."""
try:
i = self._randbelow(len(seq))
except ValueError:
raise IndexError('Cannot choose from an empty sequence')
return seq[i]
And here is my proposed (tested) version:
def choice(self, it):
"""Choose a random element from a non-empty iterable."""
try:
it[0]
except IndexError:
raise IndexError('Cannot choose from an empty sequence')
except TypeError:
it = tuple(it)
try:
i = self._randbelow(len(it))
except ValueError:
raise IndexError('Cannot choose from an empty iterable')
return it[i]
This works on (e.g.) a list/tuple/string, a set, a dictionary view or a
generator.
Obviously the generator has to be 'fresh' (i.e. any previously consumed
values will be excluded from the choice) and 'throw-away' (random.choice
will exhaust it).
But it means you can write code like
random.choice(x for x in xrange(10) if x%3) # this feels quite
"Pythonic" to me!
(I experimented with versions that, when passed an object that supported
len() but not indexing, viz. a set, iterated through the object as far
as necessary instead of converting it to a list. But I found that this
was slower in practice as well as more complicated. There might be a
memory issue with huge iterables, but we are no worse off using existing
workarounds for those.)
Downsides of proposed version:
(1) Backward compatibility. Could mask an error if a "wrong"
object, but one that happens to be an iterable, is passed to random.choice.
There is also slightly different behaviour if a dictionary
(D) is passed (an unusual thing to do):
The current version will get a random integer in the
range [0, len(D)), try to use that as a key of D,
and raise KeyError if that fails.
The proposed version behaves similarly except that it
will always raise "KeyError: 0" if 0 is not a key of D.
One could argue that this is an improvement, if it
matters at all (error reproducibility).
And code that deliberately exploits the existing
behaviour, provided that it uses a dictionary whose keys are consecutive
integers from 0 upwards, e.g.
D = { 0 : 'zero', 1 : 'one', 2 : 'two', 3 :
'three', 4 : 'four' }
d = random.choice(D) # equivalent, in this
example, to "d = random.choice(list(D.values()))"
if d == 'zero':
...
will continue to work (although one might argue that it
deserves not to).
(2) Speed - the proposed version is almost certainly slightly
slower when called with a sequence.
For what it's worth, I tried to measure the difference on my
several-years-old Windows PC, but is was hard to measure accurately,
even over millions of iterations.
All I can say is that it appeared to be a small fraction of a
millisecond, possibly in the region of 50 nanoseconds, per call.
Best wishes
Rob Cliffe
More information about the Python-ideas
mailing list