[Python-ideas] random.choice on non-sequence
Guido van Rossum
guido at python.org
Tue Apr 12 17:53:17 EDT 2016
The problem with your version is that copying the input is slow if it is large.
Raymond was referencing the top answer here:
http://stackoverflow.com/questions/2394246/algorithm-to-select-a-single-random-combination-of-values.
It's also slow though (draws N random values if the input is of length
N).
On Tue, Apr 12, 2016 at 2:40 PM, Rob Cliffe <rob.cliffe at btinternet.com> wrote:
> It surprised me a bit the first time I realised that random.choice did not
> work on a set. (One expects "everything" to "just work" in Python! :-) )
> There is a simple workaround (convert the set to a tuple or list before
> passing it) but ...
>
> On 2011-06-23, Sven Marnach posted 'A few suggestions for the random module'
> on Python-Ideas, one of which was to allow random.choice to work on an
> arbitrary iterable.
> Raymond Hettinger commented
> "It could be generalized to arbitrary iterables (Bentley provides an
> example of how to do this) but it is fragile (i.e. falls apart badly with
> weak random number generators) and doesn't correspond well with real use
> cases."
> Sven replied
> "Again, there definitely are real world use cases ..."
>
> I would like to revive this suggestion. I don't understand Raymond's
> comment. It seems to me that a reasonable implementation is better and more
> friendly than none, particularly for newbies.
> This is the source code for random.choice in Python 3.2 (as far as I know it
> hasn't changed since):
>
>
> def choice(self, seq):
> """Choose a random element from a non-empty sequence."""
> try:
> i = self._randbelow(len(seq))
> except ValueError:
> raise IndexError('Cannot choose from an empty sequence')
> return seq[i]
>
>
> And here is my proposed (tested) version:
>
>
> def choice(self, it):
> """Choose a random element from a non-empty iterable."""
> try:
> it[0]
> except IndexError:
> raise IndexError('Cannot choose from an empty sequence')
> except TypeError:
> it = tuple(it)
> try:
> i = self._randbelow(len(it))
> except ValueError:
> raise IndexError('Cannot choose from an empty iterable')
> return it[i]
>
>
> This works on (e.g.) a list/tuple/string, a set, a dictionary view or a
> generator.
> Obviously the generator has to be 'fresh' (i.e. any previously consumed
> values will be excluded from the choice) and 'throw-away' (random.choice
> will exhaust it).
> But it means you can write code like
> random.choice(x for x in xrange(10) if x%3) # this feels quite
> "Pythonic" to me!
>
> (I experimented with versions that, when passed an object that supported
> len() but not indexing, viz. a set, iterated through the object as far as
> necessary instead of converting it to a list. But I found that this was
> slower in practice as well as more complicated. There might be a memory
> issue with huge iterables, but we are no worse off using existing
> workarounds for those.)
>
> Downsides of proposed version:
> (1) Backward compatibility. Could mask an error if a "wrong" object,
> but one that happens to be an iterable, is passed to random.choice.
> There is also slightly different behaviour if a dictionary (D) is
> passed (an unusual thing to do):
> The current version will get a random integer in the range
> [0, len(D)), try to use that as a key of D,
> and raise KeyError if that fails.
> The proposed version behaves similarly except that it will
> always raise "KeyError: 0" if 0 is not a key of D.
> One could argue that this is an improvement, if it matters
> at all (error reproducibility).
> And code that deliberately exploits the existing behaviour,
> provided that it uses a dictionary whose keys are consecutive integers from
> 0 upwards, e.g.
> D = { 0 : 'zero', 1 : 'one', 2 : 'two', 3 : 'three', 4 :
> 'four' }
> d = random.choice(D) # equivalent, in this example,
> to "d = random.choice(list(D.values()))"
> if d == 'zero':
> ...
> will continue to work (although one might argue that it
> deserves not to).
> (2) Speed - the proposed version is almost certainly slightly slower
> when called with a sequence.
> For what it's worth, I tried to measure the difference on my
> several-years-old Windows PC, but is was hard to measure accurately, even
> over millions of iterations.
> All I can say is that it appeared to be a small fraction of a
> millisecond, possibly in the region of 50 nanoseconds, per call.
>
> Best wishes
> Rob Cliffe
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
--
--Guido van Rossum (python.org/~guido)
More information about the Python-ideas
mailing list