Re: [Python-ideas] About adding a new iteratormethodcalled "shuffled"

March 26, 2009

      On Thu, 26 Mar 2009 08:41:27 am Marcin 'Qrczak' Kowalczyk wrote:
...
On Wed, Mar 25, 2009 at 13:28, Steven D'Aprano <steve@pearwood.info> 
wrote:
...
Don't get me wrong, random.shuffle() is perfectly adequate for any
use-case I can think of. But beyond 2080 items in the list, it
becomes greatly biased, and I think that's important to note in the
docs. Those who need to know about it will be told, and those who
don't care can continue to not care.
Why anyone would care? Orderings possible to obtain from a given good
random number generator are quite uniformly distributed among all
orderings.
Yes, that holds true for n <= 2080, since Fisher-Yates is an unbiased 
shuffler. But I don't think it remains true for n > 2080 since the vast 
majority of possible permutations have probability zero. I'm not saying 
that this absolutely *will* introduce statistical bias into the 
shuffled lists, but it could, and those who care about that risk 
shouldn't have to read the source code to learn this.
...
I bet you can't even predict any particular ordering which 
is impossible to obtain.
A moral dilemma... should I take advantage of your innumeracy by taking 
you up on that bet, or should I explain why that bet is a sure thing 
for me? *wink*

Since the chances of me collecting on the bet is essentially near zero, 
I'll explain.

For a list with 2082 items, shuffle() chooses from a subset of 
approximately 0.001% of all possible permutations. This means that if I 
give you a list of 2082 items and tell you to shuffle it, and then 
guess that such-and-such a permutation of it will never be reached, I 
can only lose if by chance I guessed on the 1 in 100,000 permutations 
that shuffle() can reach. I have 99,999 chances to win versus 1 to 
lose: that's essentially a sure thing.

In practical terms, beyond (say) 2085 or so, it would be a bona fide 
miracle if I didn't win such a bet.

-- 
Steven D'Aprano