Implementaion of random.shuffle
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Tue Jul 17 04:52:33 CEST 2007
On Mon, 16 Jul 2007 14:45:48 +0000, shabda raaj wrote:
> On Jul 16, 5:53 pm, Steve Holden <st... at holdenweb.com> wrote:
>> shabda raaj wrote:
>> > I was going through the docs for module-random
>> > <http://docs.python.org/lib/module-random.html> And I came through
>> > this,
>>
>> > * shuffle*( x[, random])
>>
>> > Shuffle the sequence x in place. The optional argument random is
>> > a 0-argument function returning a random float in [0.0, 1.0); by
>> > default, this is the function random().
>>
>> > Note that for even rather small |len(x)|, the total number of
>> > permutations of x is larger than the period of most random number
>> > generators; this implies that most permutations of a long
>> > sequence can never be generated.
[snip]
> Oh, I wasn't aware that I could see the source of all python modules. I
> guess, then the implementation is correct(its just the knuth shuffle,
> moving downward), its just the doc which is in error. Might be we should
> log a doc bug for this?
No, it is not a doc error, you are confused.
Suppose you have a random number generator that has the tiny period of
120. (Obviously that's a really lousy RNG, but it will do for an
example.) That means it can only produce 120 different results (the
current number, plus the state the generator it is in) before repeating.
Note that the period is not the same as the spread of values returned,
the two are independent, although naturally the period must be at least
equal to the number of unique values returned.
If you have a list of just six items, there are 6! = 720 possible
permutations of the list, so AT BEST the random number generator can
produce 120/720 or 17% of the permutations.
That's an example of the pigeonhole principle, one of the most simple yet
powerful mathematical principles around. If you've got more items than
pigeonholes to put them in, either some will double up or some will miss
out.
In the case of CPython, the current implementation uses the Mersenne
Twister, which has a huge period of 2**19937. However, 2081! is larger
than that number, which means that at best a list of 2081 items or longer
can't be perfectly shuffled (not every permutation can be selected by the
algorithm).
--
Steven.
More information about the Python-list
mailing list