Matrix dot product over an axis(for a 3d array/list of matrices)
Hello, I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead. I do something like this:
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))
Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline? -- Emmanuel
Could you place all Rot's into the same array and all the Trans's into the same array? If you have the first index of each array refer to which array it is numpy.dot should work fine, since numpy.dot just does the dot product over the second to last and last indexes. http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html John On Thu, Jul 15, 2010 at 9:38 AM, Emmanuel Bengio <bengioe@gmail.com> wrote:
Hello,
I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.
I do something like this:
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))
Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline?
--
Emmanuel
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Could you place all Rot's into the same array and all the Trans's into the same array? Well I guess since they're all the same size. I would just have to do array(a). But the result of the dot product of two 3d arrays is most unexpected:
a = numpy.ones((4,5,6)) a = numpy.ones((10,4,4)) b = numpy.ones((10,4,4)) c = numpy.dot(a,b) c.shape (10, 4, 10, 4) #Hmm, not what a newbie expects D:
Yes, there is a trick for this using a multiply with properly placed newaxis followed by a sum. It uses more memory but for stacks of small arrays that shouldn't matter. See the post here<http://thread.gmane.org/gmane.comp.python.numeric.general/20360/focus=21033>.
Hmm, I'm not sure I understand what is being done there. On 15 July 2010 12:45, John Salvatier <jsalvati@u.washington.edu> wrote:
Could you place all Rot's into the same array and all the Trans's into the same array? If you have the first index of each array refer to which array it is numpy.dot should work fine, since numpy.dot just does the dot product over the second to last and last indexes. http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html
John
On Thu, Jul 15, 2010 at 9:38 AM, Emmanuel Bengio <bengioe@gmail.com>wrote:
Hello,
I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.
I do something like this:
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))
Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline?
--
Emmanuel
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Emmanuel
On Thu, Jul 15, 2010 at 11:32 AM, Emmanuel Bengio <bengioe@gmail.com> wrote:
Could you place all Rot's into the same array and all the Trans's into the same array? Well I guess since they're all the same size. I would just have to do array(a). But the result of the dot product of two 3d arrays is most unexpected:
a = numpy.ones((4,5,6)) a = numpy.ones((10,4,4)) b = numpy.ones((10,4,4)) c = numpy.dot(a,b) c.shape (10, 4, 10, 4) #Hmm, not what a newbie expects D:
Yes, there is a trick for this using a multiply with properly placed newaxis followed by a sum. It uses more memory but for stacks of small arrays that shouldn't matter. See the post here<http://thread.gmane.org/gmane.comp.python.numeric.general/20360/focus=21033>.
Hmm, I'm not sure I understand what is being done there.
It's just matrix multipy considered as a sum of the outer products of column and row vectors, i.e., outer product of first column in first matrix and first row in second matrix plus outer product of second column in first matrix plus second row in second matrix, etc. That form is easily adapted to stacks of matrices and is sometimes used on vector architectures, see Golub and Van Loan. In [6]: a = ones((10,4,4)) In [7]: b = ones((10,4,4)) In [8]: sum(a[...,:,:,newaxis]*b[...,newaxis,:,:], axis=-2) Out[8]: array([[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]]]) <snip> Chuck
Ok I get it. Thanks! Numpy syntax that works for me: numpy.sum(a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:],axis=-2) On 15 July 2010 13:46, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, Jul 15, 2010 at 11:32 AM, Emmanuel Bengio <bengioe@gmail.com>wrote:
Could you place all Rot's into the same array and all the Trans's into the same array? Well I guess since they're all the same size. I would just have to do array(a). But the result of the dot product of two 3d arrays is most unexpected:
a = numpy.ones((4,5,6)) a = numpy.ones((10,4,4)) b = numpy.ones((10,4,4)) c = numpy.dot(a,b) c.shape (10, 4, 10, 4) #Hmm, not what a newbie expects D:
Yes, there is a trick for this using a multiply with properly placed newaxis followed by a sum. It uses more memory but for stacks of small arrays that shouldn't matter. See the post here<http://thread.gmane.org/gmane.comp.python.numeric.general/20360/focus=21033>.
Hmm, I'm not sure I understand what is being done there.
It's just matrix multipy considered as a sum of the outer products of column and row vectors, i.e., outer product of first column in first matrix and first row in second matrix plus outer product of second column in first matrix plus second row in second matrix, etc. That form is easily adapted to stacks of matrices and is sometimes used on vector architectures, see Golub and Van Loan.
In [6]: a = ones((10,4,4))
In [7]: b = ones((10,4,4))
In [8]: sum(a[...,:,:,newaxis]*b[...,newaxis,:,:], axis=-2) Out[8]: array([[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],
[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],
[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],
[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],
[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],
[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],
[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],
[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],
[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],
[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]]])
<snip>
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Emmanuel
On Thu, Jul 15, 2010 at 12:00 PM, Emmanuel Bengio <bengioe@gmail.com> wrote:
Ok I get it. Thanks!
Numpy syntax that works for me: numpy.sum(a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:],axis=-2)
The leading "..." gives the same thing, but iterates over all the leading indicies in case you want multidimensional arrays of matrices ;) You can also use the sum method which might be a bit more economical: (a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:]).sum(axis=-2) How do the execution times compare? <snip> Chuck
I get about 60% of the original execution times for about any size of stack. On 15 July 2010 14:09, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, Jul 15, 2010 at 12:00 PM, Emmanuel Bengio <bengioe@gmail.com>wrote:
Ok I get it. Thanks!
Numpy syntax that works for me: numpy.sum(a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:],axis=-2)
The leading "..." gives the same thing, but iterates over all the leading indicies in case you want multidimensional arrays of matrices ;) You can also use the sum method which might be a bit more economical:
(a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:]).sum(axis=-2)
How do the execution times compare?
<snip>
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Emmanuel
On Thu, Jul 15, 2010 at 9:38 AM, Emmanuel Bengio <bengioe@gmail.com> wrote:
Hello,
I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.
I do something like this:
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))
Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline?
How about using list comprehension? And setting dot = numpy.dot. Would initializing the the c list first help?
On Thu, Jul 15, 2010 at 9:45 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Thu, Jul 15, 2010 at 9:38 AM, Emmanuel Bengio <bengioe@gmail.com> wrote:
Hello,
I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.
I do something like this:
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))
Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline?
How about using list comprehension? And setting dot = numpy.dot. Would initializing the the c list first help?
Doesn't buy much:
def forloop(a, b): .....: c = [] .....: for x, y in izip(a, b): .....: c.append(np.dot(x, y)) .....: return c .....: a = [np.random.rand(4,4) for i in range(10000)] b = [np.random.rand(4,4) for i in range(10000)]
timeit forloop(a, b) 10 loops, best of 3: 21.2 ms per loop
dot = np.dot timeit [dot(x, y) for x, y in izip(a, b)] 100 loops, best of 3: 19.2 ms per loop
On Thu, Jul 15, 2010 at 10:38 AM, Emmanuel Bengio <bengioe@gmail.com> wrote:
Hello,
I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.
I do something like this:
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))
Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline?
Yes, there is a trick for this using a multiply with properly placed newaxis followed by a sum. It uses more memory but for stacks of small arrays that shouldn't matter. See the post here<http://thread.gmane.org/gmane.comp.python.numeric.general/20360/focus=21033>. Chuck
On 2010-07-15, at 12:38 PM, Emmanuel Bengio <bengioe@gmail.com> wrote:
Hello,
I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.
I do something like this:
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))
If you need/want more speed than the solution Chuck proposed, you should check out Cython and Tokyo. Cython lets you write loops that execute at C speed, whereas Tokyo provides a Cython level wrapper for BLAS (no need to go through Python code to call NumPy). Tokyo was designed for exactly your use case: lots of matrix multiplies with relatively small matrices, where you start noticing the Python overhead. David
On 2010-07-15, at 4:31 PM, David Warde-Farley wrote:
If you need/want more speed than the solution Chuck proposed, you should check out Cython and Tokyo. Cython lets you write loops that execute at C speed, whereas Tokyo provides a Cython level wrapper for BLAS (no need to go through Python code to call NumPy). Tokyo was designed for exactly your use case: lots of matrix multiplies with relatively small matrices, where you start noticing the Python overhead.
It occurred to me I neglected to provide a link (cursed iPhone): http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-mat... David
On Thu, Jul 15, 2010 at 4:28 PM, David Warde-Farley <dwf@cs.toronto.edu>wrote:
On 2010-07-15, at 4:31 PM, David Warde-Farley wrote:
If you need/want more speed than the solution Chuck proposed, you should check out Cython and Tokyo. Cython lets you write loops that execute at C speed, whereas Tokyo provides a Cython level wrapper for BLAS (no need to go through Python code to call NumPy). Tokyo was designed for exactly your use case: lots of matrix multiplies with relatively small matrices, where you start noticing the Python overhead.
It occurred to me I neglected to provide a link (cursed iPhone):
http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-mat...
For speed I'd go straight to c and avoid BLAS since the matrices are so small. There might also be a cache advantage to copying the non-contiguous columns of the rhs to the stack. Chuck
participants (5)
-
Charles R Harris
-
David Warde-Farley
-
Emmanuel Bengio
-
John Salvatier
-
Keith Goodman