[Tutor] Any suggestions for optimizing this code?
grouchy
grouch at gmail.com
Tue Sep 20 07:01:43 CEST 2005
Replying to myself, I got some speedups by replacing:
def makeArray1(matrix):
result = matrix
result[0][w/2] = 1
for row in range(h-1):
last = result[row]
next = result[row+1]
for i in range(w-1):
next[i] = rule[4*last[i-1]+2*last[i]+last[i+1]]
next[i+1] = rule[4*last[i]+2*last[i+1]+last[0]]
return result
with this using Numerical Python:
def makeArray2(matrix):
result = matrix
result[0,w/2] = 1
for n in range(h-1):
r = result[n]
r2 = result[n+1]
r2[1:-1] = choose(4*r[:-2]+2*r[1:-1]+r[2:],rule)
r2[0] = rule[4*r[-1]+2*r[0]+r[1]]
r2[-1] = rule[4*r[-2]+2*r[-1]+r[0]]
return result
It finally clicked that instead of a sliding window, I could just add 4*row
+ 2*row + row, each offset by one, and that would give the same results
using Numpy as stepping through three elements at a time. It's not pretty
looking, but it works.
This is about 6x faster overall. The bottleneck is in choose, where each
element gets looked up and replaced. I'm not sure how to do it any faster
tho. Choose is much faster than a for loop, which is where the 6x speedup is
really coming from. Numpy just adding and multiplying the row is more like
20x faster than stepping through an element at time with a three element
window :)
As a sidenote that may be helpful to someone, makeArray1() is 22x faster
using psyco, and 4x faster than the Numpy solution. Psyco doesn't speed up
the Numpy calculations at all, which isn't surprsing, since it's mostly
written in C. If you only use x86, that might be ok. Numpy is a lot more
elegant of a solution it seems.
I'm positive I could bring those closer together, if I could somehow not use
a lookup table to convert binary numbers to integers and back to binary
numbers.
Normally I wouldn't care a whit about optimisation, but number crunching
through a million items and suddenly Numpy seemed pretty cool. I know this
is the first time I understood the power of being able to perform a
calculation on an entire array, and stacking slices. Multiplying and adding
three rows is a lot faster than stepping through a single row three elements
at a time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20050920/b6e14bc9/attachment.html
More information about the Tutor
mailing list