[Python-ideas] random.choice on non-sequence

Tue Apr 12 18:09:40 EDT 2016

Maybe I am misunderstanding but for a generator, couldn't you avoid
storing it in memory and just using the following online algorithm to
save space? http://stackoverflow.com/a/23352100

(I hope this message goes through correctly, first time participating
in a discussion on this mailing list).

-Marco

On Tue, Apr 12, 2016 at 5:53 PM, Guido van Rossum <guido at python.org> wrote:
> The problem with your version is that copying the input is slow if it is large.
>
> Raymond was referencing the top answer here:
> http://stackoverflow.com/questions/2394246/algorithm-to-select-a-single-random-combination-of-values.
> It's also slow though (draws N random values if the input is of length
> N).
>
> On Tue, Apr 12, 2016 at 2:40 PM, Rob Cliffe <rob.cliffe at btinternet.com> wrote:
>> It surprised me a bit the first time I realised that random.choice did not
>> work on a set.  (One expects "everything" to "just work" in Python! :-) )
>> There is a simple workaround (convert the set to a tuple or list before
>> passing it) but ...
>>
>> On 2011-06-23, Sven Marnach posted 'A few suggestions for the random module'
>> on Python-Ideas, one of which was to allow random.choice to work on an
>> arbitrary iterable.
>> Raymond Hettinger commented
>>     "It could be generalized to arbitrary iterables (Bentley provides an
>> example of how to do this) but it is fragile (i.e. falls apart badly with
>> weak random number generators) and doesn't correspond well with real use
>> cases."
>> Sven replied
>>     "Again, there definitely are real world use cases ..."
>>
>> I would like to revive this suggestion.  I don't understand Raymond's
>> comment.  It seems to me that a reasonable implementation is better and more
>> friendly than none, particularly for newbies.
>> This is the source code for random.choice in Python 3.2 (as far as I know it
>> hasn't changed since):
>>
>>
>>     def choice(self, seq):
>>         """Choose a random element from a non-empty sequence."""
>>         try:
>>             i = self._randbelow(len(seq))
>>         except ValueError:
>>             raise IndexError('Cannot choose from an empty sequence')
>>         return seq[i]
>>
>>
>> And here is my proposed (tested) version:
>>
>>
>>     def choice(self, it):
>>         """Choose a random element from a non-empty iterable."""
>>         try:
>>             it[0]
>>         except IndexError:
>>             raise IndexError('Cannot choose from an empty sequence')
>>         except TypeError:
>>             it = tuple(it)
>>         try:
>>             i = self._randbelow(len(it))
>>         except ValueError:
>>             raise IndexError('Cannot choose from an empty iterable')
>>         return it[i]
>>
>>
>> This works on (e.g.) a list/tuple/string, a set, a dictionary view or a
>> generator.
>> Obviously the generator has to be 'fresh' (i.e. any previously consumed
>> values will be excluded from the choice) and 'throw-away' (random.choice
>> will exhaust it).
>> But it means you can write code like
>>     random.choice(x for x in xrange(10) if x%3)    # this feels quite
>> "Pythonic" to me!
>>
>> (I experimented with versions that, when passed an object that supported
>> len() but not indexing, viz. a set, iterated through the object as far as
>> necessary instead of converting it to a list.  But I found that this was
>> slower in practice as well as more complicated.  There might be a memory
>> issue with huge iterables, but we are no worse off using existing
>> workarounds for those.)
>>
>> Downsides of proposed version:
>>     (1) Backward compatibility.  Could mask an error if a "wrong" object,
>> but one that happens to be an iterable, is passed to random.choice.
>>           There is also slightly different behaviour if a dictionary (D) is
>> passed (an unusual thing to do):
>>                 The current version will get a random integer in the range
>> [0, len(D)), try to use that as a key of D,
>>                    and raise KeyError if that fails.
>>                 The proposed version behaves similarly except that it will
>> always raise "KeyError: 0" if 0 is not a key of D.
>>                 One could argue that this is an improvement, if it matters
>> at all (error reproducibility).
>>                 And code that deliberately exploits the existing behaviour,
>> provided that it uses a dictionary whose keys are consecutive integers from
>> 0 upwards, e.g.
>>                     D = { 0 : 'zero', 1 : 'one', 2 : 'two', 3 : 'three', 4 :
>> 'four' }
>>                     d = random.choice(D)    # equivalent, in this example,
>> to "d = random.choice(list(D.values()))"
>>                     if d == 'zero':
>>                         ...
>>                 will continue to work (although one might argue that it
>> deserves not to).
>>     (2) Speed - the proposed version is almost certainly slightly slower
>> when called with a sequence.
>>          For what it's worth, I tried to measure the difference on my
>> several-years-old Windows PC, but is was hard to measure accurately, even
>> over millions of iterations.
>>          All I can say is that it appeared to be a small fraction of a
>> millisecond, possibly in the region of 50 nanoseconds, per call.
>>
>> Best wishes
>> Rob Cliffe
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/