[Python-ideas] random.sample should work better with iterators

Antoine Pitrou solipsis at pitrou.net
Wed Jun 27 03:11:26 EDT 2018


On Tue, 26 Jun 2018 23:52:55 -0500
Tim Peters <tim.peters at gmail.com> wrote:
> 
> In Python today, the easiest way to spell Abe's intent is, e.g.,
> 
> >>> from heapq import nlargest # or nsmallest - doesn't matter
> >>> from random import random
> >>> nlargest(4, (i for i in range(100000)), key=lambda x: random())  
> [75260, 45880, 99486, 13478]
> >>> nlargest(4, (i for i in range(100000)), key=lambda x: random())  
> [31732, 72288, 26584, 72672]
> >>> nlargest(4, (i for i in range(100000)), key=lambda x: random())  
> [14180, 86084, 22639, 2004]
> 
> That also arranges to preserve `sample()'s promise that all sub-slices of
> the result are valid random samples too (because `nlargest` sorts by the
> randomly generated keys before returning the list).

How could slicing return an invalid random sample?

Regards

Antoine.




More information about the Python-ideas mailing list