[Python-ideas] random.choice on non-sequence
Ethan Furman
ethan at stoneleaf.us
Wed Apr 13 18:55:04 EDT 2016
On 04/12/2016 09:09 PM, Rob Cliffe wrote:
> I did a little experimenting (admittedly non-rigorous, on one platform
> only, and using Python 2.7.10, not Python 3, and using code which could
> very possibly be improved) on selecting a random element from a
> generator, and found that
>
> for small or moderate generators, reservoir sampling was almost
> always slower than generating a tuple
>
> as the generator length increased up to roughly 10,000, the ratio
> (time taken by reservoir) / (time taken by tuple)
> increased, reaching a maximum of over 4
> as the generator length increased further, the ratio started to
> decrease, although for a length of 80 million (about as large as I could
> test) it was still over 3.
I suspect this proves the point -- reservoir sampling is good not
because it is fast, but because it won't drain your memory keeping items
you will not return.
--
~Ethan~
More information about the Python-ideas
mailing list