[Python-ideas] random.choice on non-sequence
Marco Cognetta
cognetta.marco at gmail.com
Tue Apr 12 18:09:40 EDT 2016
Maybe I am misunderstanding but for a generator, couldn't you avoid
storing it in memory and just using the following online algorithm to
save space? http://stackoverflow.com/a/23352100
(I hope this message goes through correctly, first time participating
in a discussion on this mailing list).
-Marco
On Tue, Apr 12, 2016 at 5:53 PM, Guido van Rossum <guido at python.org> wrote:
> The problem with your version is that copying the input is slow if it is large.
>
> Raymond was referencing the top answer here:
> http://stackoverflow.com/questions/2394246/algorithm-to-select-a-single-random-combination-of-values.
> It's also slow though (draws N random values if the input is of length
> N).
>
> On Tue, Apr 12, 2016 at 2:40 PM, Rob Cliffe <rob.cliffe at btinternet.com> wrote:
>> It surprised me a bit the first time I realised that random.choice did not
>> work on a set. (One expects "everything" to "just work" in Python! :-) )
>> There is a simple workaround (convert the set to a tuple or list before
>> passing it) but ...
>>
>> On 2011-06-23, Sven Marnach posted 'A few suggestions for the random module'
>> on Python-Ideas, one of which was to allow random.choice to work on an
>> arbitrary iterable.
>> Raymond Hettinger commented
>> "It could be generalized to arbitrary iterables (Bentley provides an
>> example of how to do this) but it is fragile (i.e. falls apart badly with
>> weak random number generators) and doesn't correspond well with real use
>> cases."
>> Sven replied
>> "Again, there definitely are real world use cases ..."
>>
>> I would like to revive this suggestion. I don't understand Raymond's
>> comment. It seems to me that a reasonable implementation is better and more
>> friendly than none, particularly for newbies.
>> This is the source code for random.choice in Python 3.2 (as far as I know it
>> hasn't changed since):
>>
>>
>> def choice(self, seq):
>> """Choose a random element from a non-empty sequence."""
>> try:
>> i = self._randbelow(len(seq))
>> except ValueError:
>> raise IndexError('Cannot choose from an empty sequence')
>> return seq[i]
>>
>>
>> And here is my proposed (tested) version:
>>
>>
>> def choice(self, it):
>> """Choose a random element from a non-empty iterable."""
>> try:
>> it[0]
>> except IndexError:
>> raise IndexError('Cannot choose from an empty sequence')
>> except TypeError:
>> it = tuple(it)
>> try:
>> i = self._randbelow(len(it))
>> except ValueError:
>> raise IndexError('Cannot choose from an empty iterable')
>> return it[i]
>>
>>
>> This works on (e.g.) a list/tuple/string, a set, a dictionary view or a
>> generator.
>> Obviously the generator has to be 'fresh' (i.e. any previously consumed
>> values will be excluded from the choice) and 'throw-away' (random.choice
>> will exhaust it).
>> But it means you can write code like
>> random.choice(x for x in xrange(10) if x%3) # this feels quite
>> "Pythonic" to me!
>>
>> (I experimented with versions that, when passed an object that supported
>> len() but not indexing, viz. a set, iterated through the object as far as
>> necessary instead of converting it to a list. But I found that this was
>> slower in practice as well as more complicated. There might be a memory
>> issue with huge iterables, but we are no worse off using existing
>> workarounds for those.)
>>
>> Downsides of proposed version:
>> (1) Backward compatibility. Could mask an error if a "wrong" object,
>> but one that happens to be an iterable, is passed to random.choice.
>> There is also slightly different behaviour if a dictionary (D) is
>> passed (an unusual thing to do):
>> The current version will get a random integer in the range
>> [0, len(D)), try to use that as a key of D,
>> and raise KeyError if that fails.
>> The proposed version behaves similarly except that it will
>> always raise "KeyError: 0" if 0 is not a key of D.
>> One could argue that this is an improvement, if it matters
>> at all (error reproducibility).
>> And code that deliberately exploits the existing behaviour,
>> provided that it uses a dictionary whose keys are consecutive integers from
>> 0 upwards, e.g.
>> D = { 0 : 'zero', 1 : 'one', 2 : 'two', 3 : 'three', 4 :
>> 'four' }
>> d = random.choice(D) # equivalent, in this example,
>> to "d = random.choice(list(D.values()))"
>> if d == 'zero':
>> ...
>> will continue to work (although one might argue that it
>> deserves not to).
>> (2) Speed - the proposed version is almost certainly slightly slower
>> when called with a sequence.
>> For what it's worth, I tried to measure the difference on my
>> several-years-old Windows PC, but is was hard to measure accurately, even
>> over millions of iterations.
>> All I can say is that it appeared to be a small fraction of a
>> millisecond, possibly in the region of 50 nanoseconds, per call.
>>
>> Best wishes
>> Rob Cliffe
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
More information about the Python-ideas
mailing list