On 04/12/2016 09:09 PM, Rob Cliffe wrote:
I did a little experimenting (admittedly non-rigorous, on one platform only, and using Python 2.7.10, not Python 3, and using code which could very possibly be improved) on selecting a random element from a generator, and found that
for small or moderate generators, reservoir sampling was almost always slower than generating a tuple
as the generator length increased up to roughly 10,000, the ratio (time taken by reservoir) / (time taken by tuple) increased, reaching a maximum of over 4 as the generator length increased further, the ratio started to decrease, although for a length of 80 million (about as large as I could test) it was still over 3.
I suspect this proves the point -- reservoir sampling is good not because it is fast, but because it won't drain your memory keeping items you will not return. -- ~Ethan~