Replying to myself, I got some speedups by replacing:<br><br>def makeArray1(matrix):<br> result = matrix<br> result[0][w/2] = 1<br> for row in range(h-1):<br> last = result[row]<br> next = result[row+1]
<br> for i in range(w-1):
<br> next[i] = rule[4*last[i-1]+2*last[i]+last[i+1]]<br> next[i+1] = rule[4*last[i]+2*last[i+1]+last[0]]<br> return result<br><br>with this using Numerical Python:<br><br>def makeArray2(matrix):<br> result = matrix
<br> result[0,w/2] = 1<br> for n in range(h-1):<br> r = result[n]<br> r2 = result[n+1]<br> r2[1:-1] = choose(4*r[:-2]+2*r[1:-1]+r[2:],rule)<br> r2[0] = rule[4*r[-1]+2*r[0]+r[1]]<br> r2[-1] = rule[4*r[-2]+2*r[-1]+r[0]]
<br> return result<br><br><br>It finally clicked that instead of a sliding window, I could just add 4*row + 2*row + row, each offset by one, and that would give the same results using Numpy as stepping through three elements at a time. It's not pretty looking, but it works.
<br><br>This is about 6x faster overall. The bottleneck is in choose, where each element gets looked up and replaced. I'm not sure how to do it any faster tho. Choose is much faster than a for loop, which is where the 6x speedup is really coming from. Numpy just adding and multiplying the row is more like 20x faster than stepping through an element at time with a three element window :)
<br><br>As a sidenote that may be helpful to someone, makeArray1() is 22x faster using psyco, and 4x faster than the Numpy solution. Psyco doesn't speed up the Numpy calculations at all, which isn't surprsing, since it's mostly written in C. If you only use x86, that might be ok. Numpy is a lot more elegant of a solution it seems.
<br><br>I'm positive I could bring those closer together, if I could somehow not use a lookup table to convert binary numbers to integers and back to binary numbers.<br><br>Normally I wouldn't care a whit about optimisation, but number crunching through a million items and suddenly Numpy seemed pretty cool. I know this is the first time I understood the power of being able to perform a calculation on an entire array, and stacking slices. Multiplying and adding three rows is a lot faster than stepping through a single row three elements at a time.
<br><br>