[Edu-sig] More permutation madness [Cryptonomicon]

Tim Peters tim.one@home.com
Thu, 30 Nov 2000 16:10:00 -0500

[Daniel Yoo]
> There was a question about ways of generating permutations: is there an
> easy way to show that these two functions:
> ###
> def permuteRestricted(L):
>     newlist = L[:]   # shallow copy
>     for i in range(len(L)):
>         rand_i = randint(i, len(L)-1)
>         newlist[i], newlist[rand_i] = newlist[rand_i], newlist[i]
>     return newlist
> def permuteUnrestricted(L):
>     newlist = L[:]   # shallow copy
>     for i in range(len(L)):
>         rand_i = randint(0, len(L)-1)
>         rand_j = randint(0, len(L)-1)
>         newlist[rand_j], newlist[rand_i] = newlist[rand_i],
> newlist[rand_j]
>     return newlist
> ###
> act differently?
> ...

By counting.  Let N=len(L).  Then in the first version, there are N! (N
factorial) possible outcomes (on the first iteration thru the loop, rand_i
can have any of N values; on the second iteration, any of N-1; and so on).
In the second version, N**(2*N) (on the first iteration thru the loop, the
pair (rand_i, rand_j) can take any of N**2 values; and likewise for all the
other iterations).

Since there are N! possible permutations, that the first version has N!
possible outcomes means it's at least *possible* for each permutation to be
equally likely (the counting argument alone can't prove it's fair, though --
it simply doesn't rule fairness out).  But the second version can't possibly
be fair unless N**(2*N) is a multiple of N!.  That is, I think, what makes
it so seductive:  the second version *is* fair if N happens to be 1 or 2.
People think about the small cases, and leap to the conclusion that it "must
be" fair for larger N too.  But for N==3, 6 (3!) does not divide 729 (3**6),
so the second version necessarily favors some permutations.

Note that "the usual" wrong algorithm for generating a random permutation is
a bit simpler than permuteUnrestricted:

def permuteUnrestricted_usual(L):
    newlist = L[:]   # shallow copy
    for i in range(len(L)):
        rand_j = randint(0, len(L)-1)
        newlist[i], newlist[rand_j] = newlist[rand_j], newlist[i]
    return newlist

The same kind of counting argument shows that it can't be fair for len(L)>2.
An exact analysis is quite difficult!

probability-is-full-of-surprises-ly y'rs  - tim