Three dumb questions (ordered by dumbness descending)

Ian Bicking ianb at colorstudy.com
Mon Sep 23 23:58:55 EDT 2002


On Mon, 2002-09-23 at 16:03, Thorsten Kampe wrote:
> 2. Peter Norvig mentions in "Python for Lisp Programmers" some 
> "don'ts": "[x] + y" and "x[1:]". Are there more things to avoid 
> (especially in a loop)?

He's probably thinking about x[1:] because it is slower than (cdr x) --
in Python you are creating a new list.  You can use "del x[0]" instead.

The same is true of "[x] + y", as compared to (cons x y) -- you aren't
reusing the y list.  y.insert(x, 0) is considerably faster (as long as
you don't mind mutating y).

I haven't read what he said, but I think rather you shouldn't use:

x = x[1:]
# or
y = [x] + y

both these cases are suboptimal, even if they seem similar to some Lisp
cognate.

> 3. random.shuffle: the documentation says: "Note that for even rather 
> small len(x), the total number of permutations of x is larger than the 
> period of most random number generators; this implies that most 
> permutations of a long sequence can never be generated." I translate 
> this to "don't shuffle lists with more than 15 elements if you want a 
> randomly ordered list".
> 
> Alex Martelli says the same: "For example, Python users who are 
> calling random.shuffle on a sequence of substantial length have a need 
> for an underlying random generator with a *HUGE* period, as a 
> necessary although not sufficient condition towards ensuring that all 
> permutations of the shuffled sequence get generated with equal 
> likelihood.
> 
> Wichman-Hill has a period of less than 7,000 billions (7e12),
> insufficient to account even for the permutations of a sequence
> whose length is just 16 (16! > 2e13)."
> 
> Otherwise in the "Python Cookbook" (Selecting Random Elements from a 
> List Without Repetition) the shuffle version is called "the winner".
> 
> So: can I use random.shuffle or do I have to write my own - slower - 
> version to get a random list?

Well, it depends what you require of shuffle.  If you want to do
modeling on large sets with many permutations, maybe so.  The only way I
see to get around this is to use a true random number source -- like
/dev/random on Linux.  All pseudo-random sources will have a period
based on their potential seed size and amount of internal state.  For
long lists, it seems unreasonable to create a better pseudo-random
source -- though I haven't thought about it that hard.

random.shuffle is not optimized -- it's written in Python, and is very
short, and I don't think well implemented:

def shuffle(x):
    for i in xrange(len(x)-1, 0, -1):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = int(random() * (i+1))
        x[i], x[j] = x[j], x[i]

I would implement it like this:

def shuffle(x):
    newList = [(random(), i) for i in x]
    newList.sort()
    return [i[1] for i in x]

Anyway, in either case you could create your own random number source
and copy and paste that algorithm using that new source.

  Ian






More information about the Python-list mailing list