[Python-ideas] random.choice on non-sequence

Tue Apr 12 17:53:17 EDT 2016

The problem with your version is that copying the input is slow if it is large.

Raymond was referencing the top answer here:
http://stackoverflow.com/questions/2394246/algorithm-to-select-a-single-random-combination-of-values.
It's also slow though (draws N random values if the input is of length
N).

On Tue, Apr 12, 2016 at 2:40 PM, Rob Cliffe <rob.cliffe at btinternet.com> wrote:
> It surprised me a bit the first time I realised that random.choice did not
> work on a set.  (One expects "everything" to "just work" in Python! :-) )
> There is a simple workaround (convert the set to a tuple or list before
> passing it) but ...
>
> On 2011-06-23, Sven Marnach posted 'A few suggestions for the random module'
> on Python-Ideas, one of which was to allow random.choice to work on an
> arbitrary iterable.
> Raymond Hettinger commented
>     "It could be generalized to arbitrary iterables (Bentley provides an
> example of how to do this) but it is fragile (i.e. falls apart badly with
> weak random number generators) and doesn't correspond well with real use
> cases."
> Sven replied
>     "Again, there definitely are real world use cases ..."
>
> I would like to revive this suggestion.  I don't understand Raymond's
> comment.  It seems to me that a reasonable implementation is better and more
> friendly than none, particularly for newbies.
> This is the source code for random.choice in Python 3.2 (as far as I know it
> hasn't changed since):
>
>
>     def choice(self, seq):
>         """Choose a random element from a non-empty sequence."""
>         try:
>             i = self._randbelow(len(seq))
>         except ValueError:
>             raise IndexError('Cannot choose from an empty sequence')
>         return seq[i]
>
>
> And here is my proposed (tested) version:
>
>
>     def choice(self, it):
>         """Choose a random element from a non-empty iterable."""
>         try:
>             it[0]
>         except IndexError:
>             raise IndexError('Cannot choose from an empty sequence')
>         except TypeError:
>             it = tuple(it)
>         try:
>             i = self._randbelow(len(it))
>         except ValueError:
>             raise IndexError('Cannot choose from an empty iterable')
>         return it[i]
>
>
> This works on (e.g.) a list/tuple/string, a set, a dictionary view or a
> generator.
> Obviously the generator has to be 'fresh' (i.e. any previously consumed
> values will be excluded from the choice) and 'throw-away' (random.choice
> will exhaust it).
> But it means you can write code like
>     random.choice(x for x in xrange(10) if x%3)    # this feels quite
> "Pythonic" to me!
>
> (I experimented with versions that, when passed an object that supported
> len() but not indexing, viz. a set, iterated through the object as far as
> necessary instead of converting it to a list.  But I found that this was
> slower in practice as well as more complicated.  There might be a memory
> issue with huge iterables, but we are no worse off using existing
> workarounds for those.)
>
> Downsides of proposed version:
>     (1) Backward compatibility.  Could mask an error if a "wrong" object,
> but one that happens to be an iterable, is passed to random.choice.
>           There is also slightly different behaviour if a dictionary (D) is
> passed (an unusual thing to do):
>                 The current version will get a random integer in the range
> [0, len(D)), try to use that as a key of D,
>                    and raise KeyError if that fails.
>                 The proposed version behaves similarly except that it will
> always raise "KeyError: 0" if 0 is not a key of D.
>                 One could argue that this is an improvement, if it matters
> at all (error reproducibility).
>                 And code that deliberately exploits the existing behaviour,
> provided that it uses a dictionary whose keys are consecutive integers from
> 0 upwards, e.g.
>                     D = { 0 : 'zero', 1 : 'one', 2 : 'two', 3 : 'three', 4 :
> 'four' }
>                     d = random.choice(D)    # equivalent, in this example,
> to "d = random.choice(list(D.values()))"
>                     if d == 'zero':
>                         ...
>                 will continue to work (although one might argue that it
> deserves not to).
>     (2) Speed - the proposed version is almost certainly slightly slower
> when called with a sequence.
>          For what it's worth, I tried to measure the difference on my
> several-years-old Windows PC, but is was hard to measure accurately, even
> over millions of iterations.
>          All I can say is that it appeared to be a small fraction of a
> millisecond, possibly in the region of 50 nanoseconds, per call.
>
> Best wishes
> Rob Cliffe
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

-- 
--Guido van Rossum (python.org/~guido)