Implementaion of random.shuffle

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Mon Jul 16 22:52:33 EDT 2007


On Mon, 16 Jul 2007 14:45:48 +0000, shabda raaj wrote:

> On Jul 16, 5:53 pm, Steve Holden <st... at holdenweb.com> wrote:
>> shabda raaj wrote:
>> > I was going through the docs for module-random
>> > <http://docs.python.org/lib/module-random.html>  And I came through
>> > this,
>>
>> > * shuffle*(        x[, random])
>>
>> >     Shuffle the sequence x in place. The optional argument random is
>> >     a 0-argument function returning a random float in [0.0, 1.0); by
>> >     default, this is the function random().
>>
>> >     Note that for even rather small |len(x)|, the total number of
>> >     permutations of x is larger than the period of most random number
>> >     generators; this implies that most permutations of a long
>> >     sequence can never be generated.

[snip]

> Oh, I wasn't aware that I could see the source of all python modules. I
> guess, then the implementation is correct(its just the knuth shuffle,
> moving downward), its just the doc which is in error. Might be we should
> log a doc bug for this?


No, it is not a doc error, you are confused.

Suppose you have a random number generator that has the tiny period of 
120. (Obviously that's a really lousy RNG, but it will do for an 
example.) That means it can only produce 120 different results (the 
current number, plus the state the generator it is in) before repeating.

Note that the period is not the same as the spread of values returned, 
the two are independent, although naturally the period must be at least 
equal to the number of unique values returned.

If you have a list of just six items, there are 6! = 720 possible 
permutations of the list, so AT BEST the random number generator can 
produce 120/720 or 17% of the permutations.

That's an example of the pigeonhole principle, one of the most simple yet 
powerful mathematical principles around. If you've got more items than 
pigeonholes to put them in, either some will double up or some will miss 
out.
 
In the case of CPython, the current implementation uses the Mersenne 
Twister, which has a huge period of 2**19937. However, 2081! is larger 
than that number, which means that at best a list of 2081 items or longer 
can't be perfectly shuffled (not every permutation can be selected by the 
algorithm).


-- 
Steven.



More information about the Python-list mailing list