
On Wed, Mar 25, 2009 at 5:58 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, 26 Mar 2009 08:41:27 am Marcin 'Qrczak' Kowalczyk wrote:
Why anyone would care? Orderings possible to obtain from a given good random number generator are quite uniformly distributed among all orderings.
Yes, that holds true for n <= 2080, since Fisher-Yates is an unbiased shuffler. But I don't think it remains true for n > 2080 since the vast majority of possible permutations have probability zero. I'm not saying that this absolutely *will* introduce statistical bias into the shuffled lists, but it could, and those who care about that risk shouldn't have to read the source code to learn this.
If random.shuffle() is broken for lists more than 2080 then it should raise an error. Claiming it "might" be broken in the docs for moderately sized lists, without researching such a claim, is pointless fear mongering.
I bet you can't even predict any particular ordering which is impossible to obtain.
A moral dilemma... should I take advantage of your innumeracy by taking you up on that bet, or should I explain why that bet is a sure thing for me? *wink*
Since the chances of me collecting on the bet is essentially near zero, I'll explain.
For a list with 2082 items, shuffle() chooses from a subset of approximately 0.001% of all possible permutations. This means that if I give you a list of 2082 items and tell you to shuffle it, and then guess that such-and-such a permutation of it will never be reached, I can only lose if by chance I guessed on the 1 in 100,000 permutations that shuffle() can reach. I have 99,999 chances to win versus 1 to lose: that's essentially a sure thing.
In practical terms, beyond (say) 2085 or so, it would be a bona fide miracle if I didn't win such a bet.
Go ahead, pick a combination, then iterate through all 2**19937-1 permutations to prove you're correct. Don't worry, we can wait. Of course a stronger analysis technique can prove it much quicker than brute force, but it's not a cryptographically secure PRNG, there's LOTS of information that can be found through such techniques. So far the 2080 limit is random trivia, nothing more. It has no real significance, imposes no new threats, and does not change how correct code is written. -- Adam Olsen, aka Rhamphoryncus