Re: [Python-ideas] About adding a new iteratormethodcalled "shuffled"

March 26, 2009

      On Wed, Mar 25, 2009 at 5:58 PM, Steven D'Aprano <steve@pearwood.info> wrote:
...
On Thu, 26 Mar 2009 08:41:27 am Marcin 'Qrczak' Kowalczyk wrote:
...
Why anyone would care? Orderings possible to obtain from a given good
random number generator are quite uniformly distributed among all
orderings.
Yes, that holds true for n <= 2080, since Fisher-Yates is an unbiased
shuffler. But I don't think it remains true for n > 2080 since the vast
majority of possible permutations have probability zero. I'm not saying
that this absolutely *will* introduce statistical bias into the
shuffled lists, but it could, and those who care about that risk
shouldn't have to read the source code to learn this.
If random.shuffle() is broken for lists more than 2080 then it should
raise an error.  Claiming it "might" be broken in the docs for
moderately sized lists, without researching such a claim, is pointless
fear mongering.
...
...
I bet you can't even predict any particular ordering which
is impossible to obtain.
A moral dilemma... should I take advantage of your innumeracy by taking
you up on that bet, or should I explain why that bet is a sure thing
for me? *wink*
Since the chances of me collecting on the bet is essentially near zero,
I'll explain.
For a list with 2082 items, shuffle() chooses from a subset of
approximately 0.001% of all possible permutations. This means that if I
give you a list of 2082 items and tell you to shuffle it, and then
guess that such-and-such a permutation of it will never be reached, I
can only lose if by chance I guessed on the 1 in 100,000 permutations
that shuffle() can reach. I have 99,999 chances to win versus 1 to
lose: that's essentially a sure thing.
In practical terms, beyond (say) 2085 or so, it would be a bona fide
miracle if I didn't win such a bet.
Go ahead, pick a combination, then iterate through all 2**19937-1
permutations to prove you're correct.  Don't worry, we can wait.

Of course a stronger analysis technique can prove it much quicker than
brute force, but it's not a cryptographically secure PRNG, there's
LOTS of information that can be found through such techniques.

So far the 2080 limit is random trivia, nothing more.  It has no real
significance, imposes no new threats, and does not change how correct
code is written.

-- 
Adam Olsen, aka Rhamphoryncus