On 13 July 2014 14:28, Sai Rajeshwar <rajsai24@gmail.com> wrote:

2)for i in xrange(pooled_shape[1]):
            for j in xrange(pooled_shape[2]):
                for k in xrange(pooled_shape[3]):
                    for l in xrange(pooled_shape[4]):
                        pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])

You should get a speed up by accessing the arrays in a more efficient way:

pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j])

In fact:

numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3])

seems equivalent to:

numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3])

To take the last one into account:

vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1)
pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0

And you can probably get rid of the i and j indexes all together. Something like this should work (untested):

for k in...
for l in...
output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1)
output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), axis=-1)/9.0
output += b
pooled[0, :, :, k, l] = numpy.tanh(output)

In this case, one of the loops seems a great target for parallelisation. Also, Cython should help reduce the loop overhead.