scipy improve performance by parallelizing
hi all, im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these? looks like i have to work on internals of scipy.. thanks a lot.. *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------*
Specific about convolution, there is a faster implementation in Theano: http://deeplearning.net/software/theano/library/tensor/nnet/conv.html It allow you to do multiple convolution at the same time. There is a parallel implementation, but sometimes, it speed things up, but othertimes, it slow things down. Fred p.s. I'm a Theano developer. On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
looks like i have to work on internals of scipy.. thanks a lot..
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum---------*
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
hi frederic, thanks, actually im trying to implement a 3d-convolutional neural network as you can see in the snippet.. so you mean to say 1)instead of using scipy.signal.convolve i should import theano and use signal.conv2d <http://deeplearning.net/software/theano/library/tensor/signal/conv.html#thea...> , if so signal.conv2d is right or any other function according to my need.. 2)also any hints on speeding up numpy.sum in pooled[0][i][j][k][l]=math. tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) thanks a lot.. also i have seen your name some where in pylearn2.. are ua pylearn developer too. *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Thu, Jul 10, 2014 at 6:30 PM, Frédéric Bastien <nouiz@nouiz.org> wrote:
Specific about convolution, there is a faster implementation in Theano:
http://deeplearning.net/software/theano/library/tensor/nnet/conv.html
It allow you to do multiple convolution at the same time.
There is a parallel implementation, but sometimes, it speed things up, but othertimes, it slow things down.
Fred
p.s. I'm a Theano developer.
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
looks like i have to work on internals of scipy.. thanks a lot..
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum---------*
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Mon, Jul 14, 2014 at 10:53 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi frederic,
thanks, actually im trying to implement a 3d-convolutional neural network as you can see in the snippet.. so you mean to say
1)instead of using scipy.signal.convolve i should import theano and use signal.conv2d <http://deeplearning.net/software/theano/library/tensor/signal/conv.html#thea...> , if so signal.conv2d is right or any other function according to my need..
We have some special conv3d for neural network: http://deeplearning.net/software/theano/library/tensor/nnet/conv.html. Maybe the suite better what you want. But to be useful, you will need medium/big convolution, not tini video.
2)also any hints on speeding up numpy.sum in
pooled[0][i][j][k][l]=math.
tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])
Someone else reply with some information to do less indexing and do the sum on bigger chunk of data each time. This could speed up your stuff.
thanks a lot.. also i have seen your name some where in pylearn2.. are ua pylearn developer too.
Yes and no. I'm in the same lab as the main Pylearn2 dev and I do some small contribution from time to time(stuff mostly related to optimizaiton or Theano). But I wouldn't call me a pylearn2 core dev. Fred
hi Frederic.. following your advice i tried to rewrite my code.. using theano conv3d. basically im implementing a convolutional neural network.. and the problem with my code using theano is.. that error percenage across epochs doesnot decrease. i dont know if the problem with my implementation of conv3d.. i attach my code here.. thanks a lot in advance.. *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Mon, Jul 14, 2014 at 8:33 PM, Frédéric Bastien <nouiz@nouiz.org> wrote:
On Mon, Jul 14, 2014 at 10:53 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi frederic,
thanks, actually im trying to implement a 3d-convolutional neural network as you can see in the snippet.. so you mean to say
1)instead of using scipy.signal.convolve i should import theano and use signal.conv2d <http://deeplearning.net/software/theano/library/tensor/signal/conv.html#thea...> , if so signal.conv2d is right or any other function according to my need..
We have some special conv3d for neural network: http://deeplearning.net/software/theano/library/tensor/nnet/conv.html. Maybe the suite better what you want. But to be useful, you will need medium/big convolution, not tini video.
2)also any hints on speeding up numpy.sum in
pooled[0][i][j][k][l]=math.
tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])
Someone else reply with some information to do less indexing and do the sum on bigger chunk of data each time. This could speed up your stuff.
thanks a lot.. also i have seen your name some where in pylearn2.. are ua pylearn developer too.
Yes and no. I'm in the same lab as the main Pylearn2 dev and I do some small contribution from time to time(stuff mostly related to optimizaiton or Theano). But I wouldn't call me a pylearn2 core dev.
Fred
On Tue, Jul 15, 2014 at 12:53 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi frederic,
thanks, actually im trying to implement a 3d-convolutional neural network as you can see in the snippet.. so you mean to say
1)instead of using scipy.signal.convolve i should import theano and use signal.conv2d <http://deeplearning.net/software/theano/library/tensor/signal/conv.html#thea...> , if so signal.conv2d is right or any other function according to my need..
2)also any hints on speeding up numpy.sum in
pooled[0][i][j][k][l]=math.
tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])
Is it actually the individual sums that are slow, or the whole loop? Its a bit hard to read, but it looks like you could vectorise the addition and then sum? Not sure if that would help much but worth a go maybe?
thanks a lot.. also i have seen your name some where in pylearn2.. are ua pylearn developer too.
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum--------- *
On Thu, Jul 10, 2014 at 6:30 PM, Frédéric Bastien <nouiz@nouiz.org> wrote:
Specific about convolution, there is a faster implementation in Theano:
http://deeplearning.net/software/theano/library/tensor/nnet/conv.html
It allow you to do multiple convolution at the same time.
There is a parallel implementation, but sometimes, it speed things up, but othertimes, it slow things down.
Fred
p.s. I'm a Theano developer.
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
looks like i have to work on internals of scipy.. thanks a lot..
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum---------*
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
Hey, Sai I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython <http://cython.org/> if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py <http://mpi4py.scipy.org/>. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA <http://mathema.tician.de/software/pycuda/>. Thanks, Ashwin On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
looks like i have to work on internals of scipy.. thanks a lot..
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum---------*
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On 10 Jul 2014, at 05:19 pm, Ashwin Srinath <ashwinsrnth@gmail.com> wrote:
I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA.
Thanks, Ashwin
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote: hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
If your operations are using the BLAS functions a lot, you get SMP parallelisation for very cheap by linking to the multithreaded MKL or ACML versions and setting OMP_NUM_THREADS/MKL_NUM_THREADS to the no. of available cores. Cheers, Derek
hi , thanks for suggestions actually iam running my code on stampede tacc. where numpy,scipy are built against mkl libraries for optimal performance.. observations are as follows -------------------------------------------------- 1) setting different OMP_NUM_THREADS to different values didnot change the runtimes 2)the code took same time as it took on mac pro with accelerated framework for blas and lapack.. so is mkl not being helpful, or its not getting configured to use multithreads -------------------------- the statements taking lot fo time are like folllows -------------------- 1) for i in xrange(conv_out_shape[1]): conv_out[0][i]=scipy.signal.convolve(self.input[0][i%self.image_shape[1]],numpy.rot90(self.W[0][i/self.image_shape[1]],2),mode='valid') 2)for i in xrange(pooled_shape[1]): for j in xrange(pooled_shape[2]): for k in xrange(pooled_shape[3]): for l in xrange(pooled_shape[4]): pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) thanks *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Fri, Jul 11, 2014 at 6:02 PM, Derek Homeier < derek@astro.physik.uni-goettingen.de> wrote:
On 10 Jul 2014, at 05:19 pm, Ashwin Srinath <ashwinsrnth@gmail.com> wrote:
I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA.
Thanks, Ashwin
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote: hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
If your operations are using the BLAS functions a lot, you get SMP parallelisation for very cheap by linking to the multithreaded MKL or ACML versions and setting OMP_NUM_THREADS/MKL_NUM_THREADS to the no. of available cores.
Cheers, Derek
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On 13 July 2014 14:28, Sai Rajeshwar <rajsai24@gmail.com> wrote:
2)for i in xrange(pooled_shape[1]): for j in xrange(pooled_shape[2]): for k in xrange(pooled_shape[3]): for l in xrange(pooled_shape[4]):
pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])
You should get a speed up by accessing the arrays in a more efficient way: pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j]) In fact: numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) seems equivalent to: numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3]) To take the last one into account: vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0 And you can probably get rid of the i and j indexes all together. Something like this should work (untested): for k in... for l in... output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), axis=-1)/9.0 output += b pooled[0, :, :, k, l] = numpy.tanh(output) In this case, one of the loops seems a great target for parallelisation. Also, Cython should help reduce the loop overhead.
thanks.. thats great to start with.. any hints about the scipy.convolve function which is a real bottleneck.. how can i speed it up *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Sun, Jul 13, 2014 at 9:08 PM, Daπid <davidmenhur@gmail.com> wrote:
On 13 July 2014 14:28, Sai Rajeshwar <rajsai24@gmail.com> wrote:
2)for i in xrange(pooled_shape[1]): for j in xrange(pooled_shape[2]): for k in xrange(pooled_shape[3]): for l in xrange(pooled_shape[4]):
pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])
You should get a speed up by accessing the arrays in a more efficient way:
pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j])
In fact:
numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3])
seems equivalent to:
numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3])
To take the last one into account:
vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0
And you can probably get rid of the i and j indexes all together. Something like this should work (untested):
for k in... for l in... output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), axis=-1)/9.0 output += b pooled[0, :, :, k, l] = numpy.tanh(output)
In this case, one of the loops seems a great target for parallelisation. Also, Cython should help reduce the loop overhead.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
hi david, tried as you suggested ---------------------------------------------------------------- )for i in xrange(pooled_shape[1]): for j in xrange(pooled_shape[2]): for k in xrange(pooled_shape[3]): for l in xrange(pooled_shape[4]): pooled[0][i][j][k][l]=math.
tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy. sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_ out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])
You should get a speed up by accessing the arrays in a more efficient way: pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j]) In fact: numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) seems equivalent to: numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3]) To take the last one into account: vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0 And you can probably get rid of the i and j indexes all together. Something like this should work (untested): for k in... for l in... output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), axis=-1)/9.0 output += b pooled[0, :, :, k, l] = numpy.tanh(output) ----------------------------------------------------------------- for i in xrange(self.pooled_shape[1]): for j in xrange(self.pooled_shape[2]): for k in xrange(self.pooled_shape[3]): for l in xrange(self.pooled_shape[4]): #-- commented-- self.pooled[0][i][j][k][l]=math.tanh((numpy.sum(self.conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(self.conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(self.conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+self.b[i][j]) vec = numpy.sum(self.conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) self.pooled[0, i, j, k, l] = math.tanh((vec[0] + vec[1] + vec[2] )/ 9.0+self.b[i][j]) but it gave following error ---------------------------------------------------------------------------------- Traceback (most recent call last): File "3dcnn_test.py", line 401, in <module> check() File "3dcnn_test.py", line 392, in check layer1.change_input(numpy.reshape(test_set_x[i],(1,1,9,60,80))) File "3dcnn_test.py", line 77, in change_input self.pooled[0, i, j, k, l] = math.tanh((vec[0] + vec[1] + vec[2] )/ 9.0+self.b[i][j]) IndexError: index out of bounds *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Sun, Jul 13, 2014 at 9:08 PM, Daπid <davidmenhur@gmail.com> wrote:
On 13 July 2014 14:28, Sai Rajeshwar <rajsai24@gmail.com> wrote:
2)for i in xrange(pooled_shape[1]): for j in xrange(pooled_shape[2]): for k in xrange(pooled_shape[3]): for l in xrange(pooled_shape[4]):
pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])
You should get a speed up by accessing the arrays in a more efficient way:
pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j])
In fact:
numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3])
seems equivalent to:
numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3])
To take the last one into account:
vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0
And you can probably get rid of the i and j indexes all together. Something like this should work (untested):
for k in... for l in... output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), axis=-1)/9.0 output += b pooled[0, :, :, k, l] = numpy.tanh(output)
In this case, one of the loops seems a great target for parallelisation. Also, Cython should help reduce the loop overhead.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
ok i guess axis=-1 option is not required.. that solved it *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Thu, Jul 24, 2014 at 11:04 PM, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi david,
tried as you suggested ----------------------------------------------------------------
)for i in xrange(pooled_shape[1]): for j in xrange(pooled_shape[2]): for k in xrange(pooled_shape[3]): for l in xrange(pooled_shape[4]): pooled[0][i][j][k][l]=math.
tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy. sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_ out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])
You should get a speed up by accessing the arrays in a more efficient way:
pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j])
In fact:
numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3])
seems equivalent to:
numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3])
To take the last one into account:
vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0
And you can probably get rid of the i and j indexes all together. Something like this should work (untested):
for k in... for l in... output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), axis=-1)/9.0 output += b pooled[0, :, :, k, l] = numpy.tanh(output) -----------------------------------------------------------------
for i in xrange(self.pooled_shape[1]): for j in xrange(self.pooled_shape[2]): for k in xrange(self.pooled_shape[3]): for l in xrange(self.pooled_shape[4]):
#-- commented-- self.pooled[0][i][j][k][l]=math.tanh((numpy.sum(self.conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(self.conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(self.conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+self.b[i][j])
vec = numpy.sum(self.conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) self.pooled[0, i, j, k, l] = math.tanh((vec[0] + vec[1] + vec[2] )/ 9.0+self.b[i][j])
but it gave following error
---------------------------------------------------------------------------------- Traceback (most recent call last): File "3dcnn_test.py", line 401, in <module> check() File "3dcnn_test.py", line 392, in check layer1.change_input(numpy.reshape(test_set_x[i],(1,1,9,60,80))) File "3dcnn_test.py", line 77, in change_input self.pooled[0, i, j, k, l] = math.tanh((vec[0] + vec[1] + vec[2] )/ 9.0+self.b[i][j]) IndexError: index out of bounds
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum---------*
On Sun, Jul 13, 2014 at 9:08 PM, Daπid <davidmenhur@gmail.com> wrote:
On 13 July 2014 14:28, Sai Rajeshwar <rajsai24@gmail.com> wrote:
2)for i in xrange(pooled_shape[1]): for j in xrange(pooled_shape[2]): for k in xrange(pooled_shape[3]): for l in xrange(pooled_shape[4]):
pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])
You should get a speed up by accessing the arrays in a more efficient way:
pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j])
In fact:
numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3])
seems equivalent to:
numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3])
To take the last one into account:
vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0
And you can probably get rid of the i and j indexes all together. Something like this should work (untested):
for k in... for l in... output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), axis=-1)/9.0 output += b pooled[0, :, :, k, l] = numpy.tanh(output)
In this case, one of the loops seems a great target for parallelisation. Also, Cython should help reduce the loop overhead.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
Scipy does not call the MKL convolution function, so that isn't surprising. I've had good success with writing my own Cython wrapper around the Intel IPP convolution functions. On Sunday, July 13, 2014, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi , thanks for suggestions
actually iam running my code on stampede tacc. where numpy,scipy are built against mkl libraries for optimal performance.. observations are as follows --------------------------------------------------
1) setting different OMP_NUM_THREADS to different values didnot change the runtimes 2)the code took same time as it took on mac pro with accelerated framework for blas and lapack..
so is mkl not being helpful, or its not getting configured to use multithreads
-------------------------- the statements taking lot fo time are like folllows --------------------
1) for i in xrange(conv_out_shape[1]):
conv_out[0][i]=scipy.signal.convolve(self.input[0][i%self.image_shape[1]],numpy.rot90(self.W[0][i/self.image_shape[1]],2),mode='valid')
2)for i in xrange(pooled_shape[1]): for j in xrange(pooled_shape[2]): for k in xrange(pooled_shape[3]): for l in xrange(pooled_shape[4]):
pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])
thanks
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum---------*
On Fri, Jul 11, 2014 at 6:02 PM, Derek Homeier < derek@astro.physik.uni-goettingen.de <javascript:_e(%7B%7D,'cvml','derek@astro.physik.uni-goettingen.de');>> wrote:
On 10 Jul 2014, at 05:19 pm, Ashwin Srinath <ashwinsrnth@gmail.com <javascript:_e(%7B%7D,'cvml','ashwinsrnth@gmail.com');>> wrote:
I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA.
Thanks, Ashwin
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com <javascript:_e(%7B%7D,'cvml','rajsai24@gmail.com');>> wrote: hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
If your operations are using the BLAS functions a lot, you get SMP parallelisation for very cheap by linking to the multithreaded MKL or ACML versions and setting OMP_NUM_THREADS/MKL_NUM_THREADS to the no. of available cores.
Cheers, Derek
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org <javascript:_e(%7B%7D,'cvml','SciPy-Dev@scipy.org');> http://mail.scipy.org/mailman/listinfo/scipy-dev
ok luke. thanks can you throw some light on Cython wrapper for IPP convolution function.. how should i go about it ..to start with.. and bit of details would be helpful... thanks *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Sun, Jul 13, 2014 at 6:00 PM, Luke Pfister <luke.pfister@gmail.com> wrote:
Scipy does not call the MKL convolution function, so that isn't surprising.
I've had good success with writing my own Cython wrapper around the Intel IPP convolution functions.
On Sunday, July 13, 2014, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi , thanks for suggestions
actually iam running my code on stampede tacc. where numpy,scipy are built against mkl libraries for optimal performance.. observations are as follows --------------------------------------------------
1) setting different OMP_NUM_THREADS to different values didnot change the runtimes 2)the code took same time as it took on mac pro with accelerated framework for blas and lapack..
so is mkl not being helpful, or its not getting configured to use multithreads
-------------------------- the statements taking lot fo time are like folllows --------------------
1) for i in xrange(conv_out_shape[1]):
conv_out[0][i]=scipy.signal.convolve(self.input[0][i%self.image_shape[1]],numpy.rot90(self.W[0][i/self.image_shape[1]],2),mode='valid')
2)for i in xrange(pooled_shape[1]): for j in xrange(pooled_shape[2]): for k in xrange(pooled_shape[3]): for l in xrange(pooled_shape[4]):
pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j])
thanks
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum---------*
On Fri, Jul 11, 2014 at 6:02 PM, Derek Homeier < derek@astro.physik.uni-goettingen.de> wrote:
On 10 Jul 2014, at 05:19 pm, Ashwin Srinath <ashwinsrnth@gmail.com> wrote:
I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA.
Thanks, Ashwin
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote: hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
If your operations are using the BLAS functions a lot, you get SMP parallelisation for very cheap by linking to the multithreaded MKL or ACML versions and setting OMP_NUM_THREADS/MKL_NUM_THREADS to the no. of available cores.
Cheers, Derek
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
Hi =, About PyCUDA, scikits.cuda <http://scikits.appspot.com/cuda> package uses PyCUDA to provide high-level functions similar to those in numpy. maybe you should check it out (at least for examples)! []'s On Thu, Jul 10, 2014 at 12:19 PM, Ashwin Srinath <ashwinsrnth@gmail.com> wrote:
Hey, Sai
I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython <http://cython.org/> if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py <http://mpi4py.scipy.org/>. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA <http://mathema.tician.de/software/pycuda/>.
Thanks, Ashwin
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
looks like i have to work on internals of scipy.. thanks a lot..
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum---------*
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
-- *Dayvid Victor R. de Oliveira* PhD Candidate in Computer Science at Federal University of Pernambuco (UFPE) MSc in Computer Science at Federal University of Pernambuco (UFPE) BSc in Computer Engineering - Federal University of Pernambuco (UFPE)
ok thanks for scipy especially is there any way .. we can speed it up.. im right now using scipy.signal.convolve which is taking huge amount of time... can i take liverage on openmp/mpi/cuda? *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Fri, Jul 11, 2014 at 6:02 PM, Dayvid Victor <victor.dvro@gmail.com> wrote:
Hi =,
About PyCUDA, scikits.cuda <http://scikits.appspot.com/cuda> package uses PyCUDA to provide high-level functions similar to those in numpy. maybe you should check it out (at least for examples)!
[]'s
On Thu, Jul 10, 2014 at 12:19 PM, Ashwin Srinath <ashwinsrnth@gmail.com> wrote:
Hey, Sai
I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython <http://cython.org/> if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py <http://mpi4py.scipy.org/>. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA <http://mathema.tician.de/software/pycuda/>.
Thanks, Ashwin
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
looks like i have to work on internals of scipy.. thanks a lot..
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum---------*
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
-- *Dayvid Victor R. de Oliveira* PhD Candidate in Computer Science at Federal University of Pernambuco (UFPE) MSc in Computer Science at Federal University of Pernambuco (UFPE) BSc in Computer Engineering - Federal University of Pernambuco (UFPE)
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
for simple convolutions there is also np.convolve compared to scipy it releases the GIL and you can use normal python threads for parallization if you need to compute many independent convolutions and not just one. That said scipy should probably release the GIL too, probably a bug that it doesn't. On 10.07.2014 17:19, Ashwin Srinath wrote:
Hey, Sai
I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython <http://cython.org/> if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py <http://mpi4py.scipy.org/>. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA <http://mathema.tician.de/software/pycuda/>.
Thanks, Ashwin
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com <mailto:rajsai24@gmail.com>> wrote:
hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
looks like i have to work on internals of scipy.. thanks a lot..
*with regards..* * * *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi ----------------------------------Cogito Ergo Sum--------- *
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org <mailto:SciPy-Dev@scipy.org> http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
hi julian thanks.. but when i use numpy.convolve i get this error ValueError: object too deep for desired array does numpy.convolve work for 2D or 3D convolution? thanks *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Fri, Jul 11, 2014 at 11:13 PM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
for simple convolutions there is also np.convolve
compared to scipy it releases the GIL and you can use normal python threads for parallization if you need to compute many independent convolutions and not just one.
That said scipy should probably release the GIL too, probably a bug that it doesn't.
On 10.07.2014 17:19, Ashwin Srinath wrote:
Hey, Sai
I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython <http://cython.org/> if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py <http://mpi4py.scipy.org/>. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA <http://mathema.tician.de/software/pycuda/>.
Thanks, Ashwin
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com <mailto:rajsai24@gmail.com>> wrote:
hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
looks like i have to work on internals of scipy.. thanks a lot..
*with regards..* * * *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi ----------------------------------Cogito Ergo Sum--------- *
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org <mailto:SciPy-Dev@scipy.org> http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Thursday, July 24, 2014, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi julian thanks..
but when i use numpy.convolve i get this error ValueError: object too deep for desired array
does numpy.convolve work for 2D or 3D convolution? thanks
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum---------*
On Fri, Jul 11, 2014 at 11:13 PM, Julian Taylor < jtaylor.debian@googlemail.com <javascript:_e(%7B%7D,'cvml','jtaylor.debian@googlemail.com');>> wrote:
for simple convolutions there is also np.convolve
compared to scipy it releases the GIL and you can use normal python threads for parallization if you need to compute many independent convolutions and not just one.
That said scipy should probably release the GIL too, probably a bug that it doesn't.
On 10.07.2014 17:19, Ashwin Srinath wrote:
Hey, Sai
I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython <http://cython.org/> if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py <http://mpi4py.scipy.org/>. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA <http://mathema.tician.de/software/pycuda/>.
Thanks, Ashwin
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com <javascript:_e(%7B%7D,'cvml','rajsai24@gmail.com');> <mailto:rajsai24@gmail.com <javascript:_e(%7B%7D,'cvml','rajsai24@gmail.com');>>> wrote:
hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
looks like i have to work on internals of scipy.. thanks a lot..
*with regards..* * * *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi ----------------------------------Cogito Ergo Sum--------- *
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org <javascript:_e(%7B%7D,'cvml','SciPy-Dev@scipy.org');> <mailto: SciPy-Dev@scipy.org <javascript:_e(%7B%7D,'cvml','SciPy-Dev@scipy.org');>
http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org <javascript:_e(%7B%7D,'cvml','SciPy-Dev@scipy.org');> http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org <javascript:_e(%7B%7D,'cvml','SciPy-Dev@scipy.org');> http://mail.scipy.org/mailman/listinfo/scipy-dev
There are also convolution functions in scipy.ndimage. For simple smallish 1d convolution ndimage is much much faster than scipy.signal and somewhat faster than numpy.convolve.
ok .. what about 2d or 3d convolution.. does it perform better? thanks *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Thu, Jul 24, 2014 at 10:16 PM, Eric Moore <ewm@redtetrahedron.org> wrote:
On Thursday, July 24, 2014, Sai Rajeshwar <rajsai24@gmail.com> wrote:
hi julian thanks..
but when i use numpy.convolve i get this error ValueError: object too deep for desired array
does numpy.convolve work for 2D or 3D convolution? thanks
*with regards..*
*M. Sai Rajeswar* *M-tech Computer Technology*
*IIT Delhi----------------------------------Cogito Ergo Sum---------*
On Fri, Jul 11, 2014 at 11:13 PM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
for simple convolutions there is also np.convolve
compared to scipy it releases the GIL and you can use normal python threads for parallization if you need to compute many independent convolutions and not just one.
That said scipy should probably release the GIL too, probably a bug that it doesn't.
Hey, Sai
I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython <http://cython.org/> if you're computing with NumPy arrays. If you're familiar with the MPI
On 10.07.2014 17:19, Ashwin Srinath wrote: programming
model, you want to check out mpi4py <http://mpi4py.scipy.org/>. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA <http://mathema.tician.de/software/pycuda/>.
Thanks, Ashwin
On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar <rajsai24@gmail.com <mailto:rajsai24@gmail.com>> wrote:
hi all,
im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these?
looks like i have to work on internals of scipy.. thanks a lot..
*with regards..* * * *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi ----------------------------------Cogito Ergo Sum--------- *
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org <mailto:SciPy-Dev@scipy.org> http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
There are also convolution functions in scipy.ndimage. For simple smallish 1d convolution ndimage is much much faster than scipy.signal and somewhat faster than numpy.convolve.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
Hi Sai,
but when i use numpy.convolve i get this error ValueError: object too deep for desired array
does numpy.convolve work for 2D or 3D convolution?
no, it works on linear arrays only, as you will find in the documentation. It seems the best optimisation strategy for your case would depend on how many individual convolutions of what size arrays it involves. For large arrays, as Sturla has suggested, scipy.signal.fftconvolve which does operate on multi-D arrays, could be the best (or at least initially easiest) way to go. HTH Derek
Am 10.07.2014 um 10:19 schrieb Ashwin Srinath:
I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython <http://cython.org/> if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py <http://mpi4py.scipy.org/>. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA <http://mathema.tician.de/software/pycuda/>. Just stopping by to mention PyOpenCL [1] as a possible, non-Nvidia-specific (in fact not-GPU-specific) alternative to PyCUDA.
[1] http://pypi.python.org/pypi/pyopencl Andreas
participants (12)
-
Andreas Kloeckner
-
Ashwin Srinath
-
Dayvid Victor
-
Daπid
-
Derek Homeier
-
Eric Moore
-
Frédéric Bastien
-
Julian Taylor
-
Luke Pfister
-
Padarn Wilson
-
Sai Rajeshwar
-
Sturla Molden