Three dumb questions (ordered by dumbness descending)
Ian Bicking
ianb at colorstudy.com
Mon Sep 23 23:58:55 EDT 2002
On Mon, 2002-09-23 at 16:03, Thorsten Kampe wrote:
> 2. Peter Norvig mentions in "Python for Lisp Programmers" some
> "don'ts": "[x] + y" and "x[1:]". Are there more things to avoid
> (especially in a loop)?
He's probably thinking about x[1:] because it is slower than (cdr x) --
in Python you are creating a new list. You can use "del x[0]" instead.
The same is true of "[x] + y", as compared to (cons x y) -- you aren't
reusing the y list. y.insert(x, 0) is considerably faster (as long as
you don't mind mutating y).
I haven't read what he said, but I think rather you shouldn't use:
x = x[1:]
# or
y = [x] + y
both these cases are suboptimal, even if they seem similar to some Lisp
cognate.
> 3. random.shuffle: the documentation says: "Note that for even rather
> small len(x), the total number of permutations of x is larger than the
> period of most random number generators; this implies that most
> permutations of a long sequence can never be generated." I translate
> this to "don't shuffle lists with more than 15 elements if you want a
> randomly ordered list".
>
> Alex Martelli says the same: "For example, Python users who are
> calling random.shuffle on a sequence of substantial length have a need
> for an underlying random generator with a *HUGE* period, as a
> necessary although not sufficient condition towards ensuring that all
> permutations of the shuffled sequence get generated with equal
> likelihood.
>
> Wichman-Hill has a period of less than 7,000 billions (7e12),
> insufficient to account even for the permutations of a sequence
> whose length is just 16 (16! > 2e13)."
>
> Otherwise in the "Python Cookbook" (Selecting Random Elements from a
> List Without Repetition) the shuffle version is called "the winner".
>
> So: can I use random.shuffle or do I have to write my own - slower -
> version to get a random list?
Well, it depends what you require of shuffle. If you want to do
modeling on large sets with many permutations, maybe so. The only way I
see to get around this is to use a true random number source -- like
/dev/random on Linux. All pseudo-random sources will have a period
based on their potential seed size and amount of internal state. For
long lists, it seems unreasonable to create a better pseudo-random
source -- though I haven't thought about it that hard.
random.shuffle is not optimized -- it's written in Python, and is very
short, and I don't think well implemented:
def shuffle(x):
for i in xrange(len(x)-1, 0, -1):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random() * (i+1))
x[i], x[j] = x[j], x[i]
I would implement it like this:
def shuffle(x):
newList = [(random(), i) for i in x]
newList.sort()
return [i[1] for i in x]
Anyway, in either case you could create your own random number source
and copy and paste that algorithm using that new source.
Ian
More information about the Python-list
mailing list