[Python-ideas] About adding a new iteratormethodcalled "shuffled"

Thu Mar 26 07:42:09 CET 2009

On Wed, Mar 25, 2009 at 5:58 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Thu, 26 Mar 2009 08:41:27 am Marcin 'Qrczak' Kowalczyk wrote:
>> Why anyone would care? Orderings possible to obtain from a given good
>> random number generator are quite uniformly distributed among all
>> orderings.
>
> Yes, that holds true for n <= 2080, since Fisher-Yates is an unbiased
> shuffler. But I don't think it remains true for n > 2080 since the vast
> majority of possible permutations have probability zero. I'm not saying
> that this absolutely *will* introduce statistical bias into the
> shuffled lists, but it could, and those who care about that risk
> shouldn't have to read the source code to learn this.

If random.shuffle() is broken for lists more than 2080 then it should
raise an error.  Claiming it "might" be broken in the docs for
moderately sized lists, without researching such a claim, is pointless
fear mongering.

>> I bet you can't even predict any particular ordering which
>> is impossible to obtain.
>
> A moral dilemma... should I take advantage of your innumeracy by taking
> you up on that bet, or should I explain why that bet is a sure thing
> for me? *wink*
>
> Since the chances of me collecting on the bet is essentially near zero,
> I'll explain.
>
> For a list with 2082 items, shuffle() chooses from a subset of
> approximately 0.001% of all possible permutations. This means that if I
> give you a list of 2082 items and tell you to shuffle it, and then
> guess that such-and-such a permutation of it will never be reached, I
> can only lose if by chance I guessed on the 1 in 100,000 permutations
> that shuffle() can reach. I have 99,999 chances to win versus 1 to
> lose: that's essentially a sure thing.
>
> In practical terms, beyond (say) 2085 or so, it would be a bona fide
> miracle if I didn't win such a bet.

Go ahead, pick a combination, then iterate through all 2**19937-1
permutations to prove you're correct.  Don't worry, we can wait.

Of course a stronger analysis technique can prove it much quicker than
brute force, but it's not a cryptographically secure PRNG, there's
LOTS of information that can be found through such techniques.

So far the 2080 limit is random trivia, nothing more.  It has no real
significance, imposes no new threats, and does not change how correct
code is written.

-- 
Adam Olsen, aka Rhamphoryncus