[Python-ideas] About adding a new iteratormethodcalled "shuffled"
Steven D'Aprano
steve at pearwood.info
Thu Mar 26 00:58:58 CET 2009
On Thu, 26 Mar 2009 08:41:27 am Marcin 'Qrczak' Kowalczyk wrote:
> On Wed, Mar 25, 2009 at 13:28, Steven D'Aprano <steve at pearwood.info>
wrote:
> > Don't get me wrong, random.shuffle() is perfectly adequate for any
> > use-case I can think of. But beyond 2080 items in the list, it
> > becomes greatly biased, and I think that's important to note in the
> > docs. Those who need to know about it will be told, and those who
> > don't care can continue to not care.
>
> Why anyone would care? Orderings possible to obtain from a given good
> random number generator are quite uniformly distributed among all
> orderings.
Yes, that holds true for n <= 2080, since Fisher-Yates is an unbiased
shuffler. But I don't think it remains true for n > 2080 since the vast
majority of possible permutations have probability zero. I'm not saying
that this absolutely *will* introduce statistical bias into the
shuffled lists, but it could, and those who care about that risk
shouldn't have to read the source code to learn this.
> I bet you can't even predict any particular ordering which
> is impossible to obtain.
A moral dilemma... should I take advantage of your innumeracy by taking
you up on that bet, or should I explain why that bet is a sure thing
for me? *wink*
Since the chances of me collecting on the bet is essentially near zero,
I'll explain.
For a list with 2082 items, shuffle() chooses from a subset of
approximately 0.001% of all possible permutations. This means that if I
give you a list of 2082 items and tell you to shuffle it, and then
guess that such-and-such a permutation of it will never be reached, I
can only lose if by chance I guessed on the 1 in 100,000 permutations
that shuffle() can reach. I have 99,999 chances to win versus 1 to
lose: that's essentially a sure thing.
In practical terms, beyond (say) 2085 or so, it would be a bona fide
miracle if I didn't win such a bet.
--
Steven D'Aprano
More information about the Python-ideas
mailing list