[issue37682] random.sample should support iterators
Serhiy Storchaka
report at bugs.python.org
Fri Jul 26 03:22:58 EDT 2019
Serhiy Storchaka <storchaka+cpython at gmail.com> added the comment:
> ISTM that if a generator produces so much data that it is infeasible to fit in memory, then it will also take a long time to loop over it and generate a random value for each entry.
Good point!
$ ./python -m timeit -s 'from random import sample as s' 's(range(10**6), 50)'
10000 loops, best of 5: 25.6 usec per loop
$ ./python -m timeit -s 'from random import sample as s' 's(list(range(10**6)), 50)'
10 loops, best of 5: 31.5 msec per loop
$ ./python -m timeit -s 'from random import reservoir_sample as s' 's(range(10**6), 50)'
1 loop, best of 5: 328 msec per loop
$ ./python -m timeit -s 'from random import sample as s' 's(range(10**8), 50)'
10000 loops, best of 5: 26.9 usec per loop
$ ./python -m timeit -s 'from random import sample as s' 's(list(range(10**8)), 50)'
1 loop, best of 5: 3.41 sec per loop
$ ./python -m timeit -s 'from random import reservoir_sample as s' 's(range(10**8), 50)'
1 loop, best of 5: 36.5 sec per loop
It is possible that a generator produces not so much data, but every item takes much memory so the total size does not fit in memory. But I suppose that the generation time of larger items will be proportionally larger, so reservoir_sample() will be just as slow.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue37682>
_______________________________________
More information about the Python-bugs-list
mailing list