[Python-Dev] a note in random.shuffle.__doc__ ...

Tim Peters tim.peters at gmail.com
Sat Jun 10 23:31:39 CEST 2006


[Alex Martelli]
> ...claims:
>
> Note that for even rather small len(x), the total number of
> permutations of x is larger than the period of most random number
> generators; this implies that "most" permutations of a long
> sequence can never be generated.
>
> Now -- why would the behavior of "most" random number generators be
> relevant here?  The module's docs claim, for its specific Mersenne
> Twister generator, a period of 2**19997-1, which is (e.g.)

Oops!  That's wrong.  The period is 2**19937-1.

> a comfortable
> 130128673800676351960752618754658780303412233749552410245124492452914636
> 028095467780746435724876612802011164778042889281426609505759158196749438
> 742986040468247017174321241233929215223326801091468184945617565998894057
> 859403269022650639413550466514556014961826309062543 times larger than
> the number of permutations of 2000 items, which doesn't really feel
> to me like a "rather small len(x)" in this context (what I'm most
> often shuffling is just a pack of cards -- len(x)==52 -- for example).
>
> I suspect that the note is just a fossil from a time when the default
> random number generator was Whichman-Hill, with a much shorter
> period.  Should this note just be removed, or instead somehow
> reworded to point out that this is not in fact a problem for the
> module's current default random number generator?  Opinions welcome!

It should be removed now.  I'll do that :-)

WH's period was indeed so short that it couldn't generate even a tiny
fraction of the permutations of a deck of cards, and that's why the
note was added.

While a long period is necessary to get a shot at all permutations,
it's not sufficient, and I don't know what the true story is wrt the
Twister.  For example, a miserable PRNG that returns

   0.1,
   0.1, 0.2,
   0.1, 0.2, 0.2,
   0.1, 0.2, 0.2, 0.2,
   0.1, 0.2, 0.2, 0.2, 0.2,
   ...

has infinite period, but has few (O(N)) distinct subsequences of
length N.  That's a failure of so-called equidistribution in N
dimensions (for sufficiently large N, some N-vectors appear more often
than others across the whole period).  "A long" period is necessary
but not sufficient for high-dimensional equidistribution.

Off the top of my head, then, since the Twister is provably
equidistributed in 623 dimensions to 32-bit accuracy, I expect it
should be able to "fairly" generate all permutations of a sequence of
<= 623 elements (equidistribution in N dimensions implies
equidistribution in all dimensions <= N).  So I'm happy to leave a
warning out until the casinos switch to 12-deck blackjack ;-)


More information about the Python-Dev mailing list