[docs] surprisingly poor performance of random.sample(set)

Alex Frase frase at cs.wisc.edu
Tue Aug 12 22:46:17 CEST 2014


I noticed recently that running random.sample() on a set is measurably slower than doing so on a list or tuple with the same contents. I suspect that this is due to the "implementation detail" of converting the population argument to a tuple internally; that may not matter much for a single call, but when doing i.e. permutation testing, it is not unreasonable to call random.sample() on the same population 1000s of times in a tight loop. In that scenario, letting it cast the set to a tuple inside the loop makes it ~50x slower than casting once outside the loop and passing a tuple to random.sample() instead of a set.

Assuming there is no easy way for random.sample() to work efficiently with a set, would it at least be possible to mention this behavior in the documentation? Right now there is no indication that the data structure used for the population makes any difference, which may lead people to just assume that random.sample() is slow, not realizing they could get a 50x boost with a single cast outside their loop.

Alex Frase

More information about the docs mailing list