algorithm, optimization, or other problem?
Hello, I am trying to translate some Matlab/mex code to Python, for doing neural simulations. This application is definitely computing-time limited, and I need to optimize at least one inner loop of the code, or perhaps even rethink the algorithm. The procedure is very simple, after initializing any variables: 1) select a random input vector, which I will call "x". right now I have it as an array, and I choose columns from that array randomly. in other cases, I may need to take an image, select a patch, and then make that a column vector. 2) calculate an output value, which is the dot product of the "x" and a weight vector, "w", so y=dot(x,w) 3) modify the weight vector based on a matrix equation, like: w=w+ eta * (y*x - y**2*w) ^ | +---- learning rate constant 4) repeat steps 1-3 many times I've organized it like: for e in 100: # outer loop for i in 1000: # inner loop (steps 1-3) display things. so that the bulk of the computation is in the inner loop, and is amenable to converting to a faster language. This is my issue: straight python, in the example posted below for 250000 inner-loop steps, takes 20 seconds for each outer-loop step. I tried Pyrex, which should work very fast on such a problem, takes about 8.5 seconds per outer-loop step. The same code as a C-mex file in matlab takes 1.5 seconds per outer-loop step. Given the huge difference between the Pyrex and the Mex, I feel that there is something I am doing wrong, because the C-code for both should run comparably. Perhaps the approach is wrong? I'm willing to take any suggestions! I don't mind coding some in C, but the Python API seemed a bit challenging to me. One note: I am using the Numeric package, not numpy, only because I want to be able to use the Enthought version for Windows. I develop on Linux, and haven't had a chance to see if I can compile numpy using the Enthought Python for Windows. If there is anything else anyone needs to know, I'll post it. I put the main script, and a dohebb.pyx code below. thanks! Brian Blais -- ----------------- bblais@bryant.edu http://web.bryant.edu/~bblais # Main script: from dohebb import * import pylab as p from Numeric import * from RandomArray import * import time x=random((100,1000)) # 1000 input vectors numpats=x.shape[0] w=random((numpats,1)); th=random((1,1)) params={} params['eta']=0.001; params['tau']=100.0; old_mx=0; for e in range(100): rnd=randint(0,numpats,250000) t1=time.time() if 0: # straight python for i in range(len(rnd)): pat=rnd[i] xx=reshape(x[:,pat],(1,-1)) y=matrixmultiply(xx,w) w=w+params['eta']*(y*transpose(xx)-y**2*w); th=th+(1.0/params['tau'])*(y**2-th); else: # pyrex dohebb(params,w,th,x,rnd) print time.time()-t1 p.plot(w,'o-') p.xlabel('weights') p.show() #============================================= # dohebb.pyx cdef extern from "Numeric/arrayobject.h": struct PyArray_Descr: int type_num, elsize char type ctypedef class Numeric.ArrayType [object PyArrayObject]: cdef char *data cdef int nd cdef int *dimensions, *strides cdef object base cdef PyArray_Descr *descr cdef int flags def dohebb(params,ArrayType w,ArrayType th,ArrayType X,ArrayType rnd): cdef int num_iterations cdef int num_inputs cdef int offset cdef double *wp,*xp,*thp cdef int *rndp cdef double eta,tau eta=params['eta'] # learning rate tau=params['tau'] # used for variance estimate cdef double y num_iterations=rnd.dimensions[0] num_inputs=w.dimensions[0] # get the pointers wp=<double *>w.data xp=<double *>X.data rndp=<int *>rnd.data thp=<double *>th.data for it from 0 <= it < num_iterations: offset=rndp[it]*num_inputs # calculate the output y=0.0 for i from 0 <= i < num_inputs: y=y+wp[i]*xp[i+offset] # change in the weights for i from 0 <= i < num_inputs: wp[i]=wp[i]+eta*(y*xp[i+offset] - y*y*wp[i]) # estimate the variance thp[0]=thp[0]+(1.0/tau)*(y**2-thp[0])
Hi Brian, Brian Blais wrote:
Given the huge difference between the Pyrex and the Mex, I feel that there is something I am doing wrong, because the C-code for both should run comparably. Perhaps the approach is wrong? I'm willing to take any suggestions! I don't mind coding some in C, but the Python API seemed a bit challenging to me.
Have you tried weave.inline? I'm using it a lot. Here's a nice article about its performance: http://old.scipy.org/documentation/weave/weaveperformance.html Regards,Christian
Brian Blais wrote:
Given the huge difference between the Pyrex and the Mex, I feel that there is something I am doing wrong, because the C-code for both should run comparably. Perhaps the approach is wrong? I'm willing to take any suggestions! I don't mind coding some in C, but the Python API seemed a bit challenging to me.
Dear Brian, The answer is that most of your code is still using the (slow) Python API. Take a look at the c files that pyrex produces, particularly for your time critical loop. I suspect you'll be horrified to see how much stuff like the following there is. __pyx_3 = PyObject_CallObject(__pyx_12, __pyx_1); if (!__pyx_3) {__pyx_fi\lename = __pyx_f[0]; __pyx_lineno = 469; goto __pyx_L1;} These are calling the Python interpreter, which is quite expensive, as you've noticed. Fortunately, you'll be able to get rid of 99% of this stuff by clever use of Pyrex. Still, I'm actually fairly impressed that you got as much of a speedup as you did. What you need to do is convert your code to raw C array-element access to bypass the Python interpreter as much as possible. See the http://scipy.org/Wiki/Cookbook/Pyrex_and_NumPy demo for an example, particularly the print_elements() function. You can reduce the complexity of that code considerably if you deal only with contiguous, 1D, single dtype arrays. Finally, it's possible that implementing this in weave or some other wrapping solution would more straightforward. I prefer Pyrex for its general Python/C binding properties, not necessarily for any potential ease-of-array-manipulation. (Plus it has a syntax I like! :) For that matter, you could implement your function in pure C and use Pyrex (or whatever) simply to interface it.
I really like fortran and f2py for speeding up the guts of some time comsuming for loop (but there is the steep learning or relearning curve of FORTRAN). Ryan On 2/21/06, Andrew Straw <strawman@astraw.com> wrote:
Brian Blais wrote:
Given the huge difference between the Pyrex and the Mex, I feel that there is something I am doing wrong, because the C-code for both should run comparably. Perhaps the approach is wrong? I'm willing to take any suggestions! I don't mind coding some in C, but the Python API seemed a bit challenging to me.
Dear Brian,
The answer is that most of your code is still using the (slow) Python API. Take a look at the c files that pyrex produces, particularly for your time critical loop. I suspect you'll be horrified to see how much stuff like the following there is.
__pyx_3 = PyObject_CallObject(__pyx_12, __pyx_1); if (!__pyx_3) {__pyx_fi\lename = __pyx_f[0]; __pyx_lineno = 469; goto __pyx_L1;}
These are calling the Python interpreter, which is quite expensive, as you've noticed. Fortunately, you'll be able to get rid of 99% of this stuff by clever use of Pyrex. Still, I'm actually fairly impressed that you got as much of a speedup as you did.
What you need to do is convert your code to raw C array-element access to bypass the Python interpreter as much as possible. See the http://scipy.org/Wiki/Cookbook/Pyrex_and_NumPy demo for an example, particularly the print_elements() function. You can reduce the complexity of that code considerably if you deal only with contiguous, 1D, single dtype arrays.
Finally, it's possible that implementing this in weave or some other wrapping solution would more straightforward. I prefer Pyrex for its general Python/C binding properties, not necessarily for any potential ease-of-array-manipulation. (Plus it has a syntax I like! :) For that matter, you could implement your function in pure C and use Pyrex (or whatever) simply to interface it.
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.net http://www.scipy.net/mailman/listinfo/scipy-user
On 21 Feb 2006, at 17:11, Ryan Krauss wrote:
I really like fortran and f2py for speeding up the guts of some time comsuming for loop (but there is the steep learning or relearning curve of FORTRAN).
Ryan
And f2py is absurdly easy to use:) -George.
participants (5)
-
Andrew Straw -
Brian Blais -
Christian Kristukat -
George Nurser -
Ryan Krauss