
On Thu, 26 Mar 2009 08:41:27 am Marcin 'Qrczak' Kowalczyk wrote:
On Wed, Mar 25, 2009 at 13:28, Steven D'Aprano <steve@pearwood.info> wrote:
Don't get me wrong, random.shuffle() is perfectly adequate for any use-case I can think of. But beyond 2080 items in the list, it becomes greatly biased, and I think that's important to note in the docs. Those who need to know about it will be told, and those who don't care can continue to not care.
Why anyone would care? Orderings possible to obtain from a given good random number generator are quite uniformly distributed among all orderings.
Yes, that holds true for n <= 2080, since Fisher-Yates is an unbiased shuffler. But I don't think it remains true for n > 2080 since the vast majority of possible permutations have probability zero. I'm not saying that this absolutely *will* introduce statistical bias into the shuffled lists, but it could, and those who care about that risk shouldn't have to read the source code to learn this.
I bet you can't even predict any particular ordering which is impossible to obtain.
A moral dilemma... should I take advantage of your innumeracy by taking you up on that bet, or should I explain why that bet is a sure thing for me? *wink* Since the chances of me collecting on the bet is essentially near zero, I'll explain. For a list with 2082 items, shuffle() chooses from a subset of approximately 0.001% of all possible permutations. This means that if I give you a list of 2082 items and tell you to shuffle it, and then guess that such-and-such a permutation of it will never be reached, I can only lose if by chance I guessed on the 1 in 100,000 permutations that shuffle() can reach. I have 99,999 chances to win versus 1 to lose: that's essentially a sure thing. In practical terms, beyond (say) 2085 or so, it would be a bona fide miracle if I didn't win such a bet. -- Steven D'Aprano