[Python-ideas] random.choice on non-sequence

Ethan Furman ethan at stoneleaf.us
Wed Apr 13 18:55:04 EDT 2016


On 04/12/2016 09:09 PM, Rob Cliffe wrote:

> I did a little experimenting (admittedly non-rigorous, on one platform
> only, and using Python 2.7.10, not Python 3, and using code which could
> very possibly be improved) on selecting a random element from a
> generator, and found that
 >
>      for small or moderate generators, reservoir sampling was almost
> always slower than generating a tuple
 >
>      as the generator length increased up to roughly 10,000, the ratio
>              (time taken by reservoir)  / (time taken by tuple)
>          increased, reaching a maximum of over 4
>      as the generator length increased further, the ratio started to
> decrease, although for a length of 80 million (about as large as I could
> test) it was still over 3.

I suspect this proves the point -- reservoir sampling is good not 
because it is fast, but because it won't drain your memory keeping items 
you will not return.

--
~Ethan~


More information about the Python-ideas mailing list