[Tutor] Any suggestions for optimizing this code?

grouchy grouch at gmail.com
Tue Sep 20 07:01:43 CEST 2005

Replying to myself, I got some speedups by replacing:

def makeArray1(matrix):
result = matrix
result[0][w/2] = 1
for row in range(h-1):
last = result[row]
next = result[row+1]
for i in range(w-1): 
next[i] = rule[4*last[i-1]+2*last[i]+last[i+1]]
next[i+1] = rule[4*last[i]+2*last[i+1]+last[0]]
return result

with this using Numerical Python:

def makeArray2(matrix):
result = matrix 
result[0,w/2] = 1
for n in range(h-1):
r = result[n]
r2 = result[n+1]
r2[1:-1] = choose(4*r[:-2]+2*r[1:-1]+r[2:],rule)
r2[0] = rule[4*r[-1]+2*r[0]+r[1]]
r2[-1] = rule[4*r[-2]+2*r[-1]+r[0]] 
return result

It finally clicked that instead of a sliding window, I could just add 4*row 
+ 2*row + row, each offset by one, and that would give the same results 
using Numpy as stepping through three elements at a time. It's not pretty 
looking, but it works. 

This is about 6x faster overall. The bottleneck is in choose, where each 
element gets looked up and replaced. I'm not sure how to do it any faster 
tho. Choose is much faster than a for loop, which is where the 6x speedup is 
really coming from. Numpy just adding and multiplying the row is more like 
20x faster than stepping through an element at time with a three element 
window :) 

As a sidenote that may be helpful to someone, makeArray1() is 22x faster 
using psyco, and 4x faster than the Numpy solution. Psyco doesn't speed up 
the Numpy calculations at all, which isn't surprsing, since it's mostly 
written in C. If you only use x86, that might be ok. Numpy is a lot more 
elegant of a solution it seems. 

I'm positive I could bring those closer together, if I could somehow not use 
a lookup table to convert binary numbers to integers and back to binary 

Normally I wouldn't care a whit about optimisation, but number crunching 
through a million items and suddenly Numpy seemed pretty cool. I know this 
is the first time I understood the power of being able to perform a 
calculation on an entire array, and stacking slices. Multiplying and adding 
three rows is a lot faster than stepping through a single row three elements 
at a time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20050920/b6e14bc9/attachment.html

More information about the Tutor mailing list