[issue44080] Bias in random.choices(long_sequence)

Sun May 9 14:58:59 EDT 2021

Raymond Hettinger <raymond.hettinger at gmail.com> added the comment:

FWIW, the principal use case that choices() was designed for is resampling/bootstapping.  In that use case, speed matters and small imbalances in large sequences don't matter at all.  Also, the API was designed to make it easy to select from an itemized population of individuals than a large range followed by a modulo calculation (we already have randrange() to meet that need).

I could add a sentence to the last paragraph recommending "[choice(pop) for i in range(k)]" for the case where 1)  the population is large, 2) the speed doesn't matter, 3) the weights are equal, and 4) integer math is desired to avoid any trace of bias.  That said, I don't users would benefit from it and that this is largely just a theoretical concern that doesn't warrant more than the existing paragraph.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue44080>
_______________________________________