[docs] surprisingly poor performance of random.sample(set)

Georg Brandl georg at python.org
Wed Aug 13 07:50:42 CEST 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/12/2014 10:46 PM, Alex Frase wrote:
> Hi,
> 
> I noticed recently that running random.sample() on a set is measurably
> slower than doing so on a list or tuple with the same contents. I suspect
> that this is due to the "implementation detail" of converting the
> population argument to a tuple internally; that may not matter much for a
> single call, but when doing i.e. permutation testing, it is not
> unreasonable to call random.sample() on the same population 1000s of times
> in a tight loop. In that scenario, letting it cast the set to a tuple
> inside the loop makes it ~50x slower than casting once outside the loop and
> passing a tuple to random.sample() instead of a set.
> 
> Assuming there is no easy way for random.sample() to work efficiently with
> a set, would it at least be possible to mention this behavior in the 
> documentation? Right now there is no indication that the data structure
> used for the population makes any difference, which may lead people to just
> assume that random.sample() is slow, not realizing they could get a 50x
> boost with a single cast outside their loop.

Hi Alex,

please open an issue on bugs.python.org to discuss this issue with the
maintainers.

cheers,
Georg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlPq/LIACgkQN9GcIYhpnLBoZgCgpyUeE2j5dTR4r4yvSwiZQsNQ
bDUAnR0zMqC1Op/YlP4TotvGfQMCzCqY
=3CY/
-----END PGP SIGNATURE-----


More information about the docs mailing list