[docs] [issue33114] random.sample() behavior is unexpected/unclear from docs
report at bugs.python.org
Fri Mar 23 17:22:28 EDT 2018
Raymond Hettinger <raymond.hettinger at gmail.com> added the comment:
> Something along the lines of: "For a fixed seed, random.sample(population, k)
> is not guaranteed to return the same samples for different values of k."
In a way, the proposed wording succinctly directly addresses the problem you had. So, it would seem like a reasonable suggestion. On the other hand, it would be easy for others who haven't had this problem to have a hard time figuring out what it means (when should they be worried, what should be avoided, why is it a concern at all, what to do about it).
In general, the docs are worded in an affirmative manner (here's what something does, here's what it is for, and here is how to use it correctly). In this case, the docs already indicate the intended way to address this use case: "the resulting list is in selection order so that all sub-slices will also be valid random samples. This allows raffle winners (the sample) to be partitioned into grand prize and second place winners (the subslices)."
Perhaps there could be an algorithmic note, "internally, sample() shifts selection algorithms depending on the proportion of the population being sampled". However, this would be unusual -- we don't usually document implementation details. Numpy and R make no mention of the internals. Julia does discuss the algorithms but primarily from an efficiency point-of-view rather than as a usage note.
Perhaps it may be best to leave this alone rather than adding a note that may itself create confusion and worry. AFAICT, this hasn't come up before in the 15 year history of random.sample(), not even a StackOverflow question.
Python tracker <report at bugs.python.org>
More information about the docs