Mailman 3 Matrix dot product over an axis(for a 3d array/list of matrices) - NumPy-Discussion

Matrix dot product over an axis(for a 3d array/list of matrices)

Emmanuel Bengio

15 Jul 2010 15 Jul '10

4:38 p.m.

Hello, I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead. I do something like this:

...

...
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))

Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline? -- Emmanuel

Attachments:

attachment.htm (text/html — 679 bytes)

Show replies by date

John Salvatier

15 Jul 15 Jul

4:45 p.m.

Could you place all Rot's into the same array and all the Trans's into the same array? If you have the first index of each array refer to which array it is numpy.dot should work fine, since numpy.dot just does the dot product over the second to last and last indexes. http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html John On Thu, Jul 15, 2010 at 9:38 AM, Emmanuel Bengio <bengioe@gmail.com> wrote:

...

Hello,

I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.

I do something like this:

...
...
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))

Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline?

--

Emmanuel

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Emmanuel Bengio

5:32 p.m.

...

Could you place all Rot's into the same array and all the Trans's into the same array? Well I guess since they're all the same size. I would just have to do array(a). But the result of the dot product of two 3d arrays is most unexpected:

...
...
a = numpy.ones((4,5,6)) a = numpy.ones((10,4,4)) b = numpy.ones((10,4,4)) c = numpy.dot(a,b) c.shape (10, 4, 10, 4) #Hmm, not what a newbie expects D:

...

Yes, there is a trick for this using a multiply with properly placed newaxis followed by a sum. It uses more memory but for stacks of small arrays that shouldn't matter. See the post here<http://thread.gmane.org/gmane.comp.python.numeric.general/20360/focus=21033>.

Hmm, I'm not sure I understand what is being done there. On 15 July 2010 12:45, John Salvatier <jsalvati@u.washington.edu> wrote:

...

Could you place all Rot's into the same array and all the Trans's into the same array? If you have the first index of each array refer to which array it is numpy.dot should work fine, since numpy.dot just does the dot product over the second to last and last indexes. http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html

John

On Thu, Jul 15, 2010 at 9:38 AM, Emmanuel Bengio <bengioe@gmail.com>wrote:

...
Hello,

I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.

I do something like this:

...
...
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))

Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline?

--

Emmanuel

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Emmanuel

Charles R Harris

5:46 p.m.

On Thu, Jul 15, 2010 at 11:32 AM, Emmanuel Bengio <bengioe@gmail.com> wrote:

...

...
Could you place all Rot's into the same array and all the Trans's into the same array? Well I guess since they're all the same size. I would just have to do array(a). But the result of the dot product of two 3d arrays is most unexpected:

...
...
a = numpy.ones((4,5,6)) a = numpy.ones((10,4,4)) b = numpy.ones((10,4,4)) c = numpy.dot(a,b) c.shape (10, 4, 10, 4) #Hmm, not what a newbie expects D:

...
Yes, there is a trick for this using a multiply with properly placed newaxis followed by a sum. It uses more memory but for stacks of small arrays that shouldn't matter. See the post here<http://thread.gmane.org/gmane.comp.python.numeric.general/20360/focus=21033>.

Hmm, I'm not sure I understand what is being done there.

It's just matrix multipy considered as a sum of the outer products of column and row vectors, i.e., outer product of first column in first matrix and first row in second matrix plus outer product of second column in first matrix plus second row in second matrix, etc. That form is easily adapted to stacks of matrices and is sometimes used on vector architectures, see Golub and Van Loan. In [6]: a = ones((10,4,4)) In [7]: b = ones((10,4,4)) In [8]: sum(a[...,:,:,newaxis]*b[...,newaxis,:,:], axis=-2) Out[8]: array([[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]], [[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]]]) <snip> Chuck

Emmanuel Bengio

6 p.m.

Ok I get it. Thanks! Numpy syntax that works for me: numpy.sum(a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:],axis=-2) On 15 July 2010 13:46, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Thu, Jul 15, 2010 at 11:32 AM, Emmanuel Bengio <bengioe@gmail.com>wrote:

...
...
Could you place all Rot's into the same array and all the Trans's into the same array? Well I guess since they're all the same size. I would just have to do array(a). But the result of the dot product of two 3d arrays is most unexpected:

...
...
a = numpy.ones((4,5,6)) a = numpy.ones((10,4,4)) b = numpy.ones((10,4,4)) c = numpy.dot(a,b) c.shape (10, 4, 10, 4) #Hmm, not what a newbie expects D:

...
Yes, there is a trick for this using a multiply with properly placed newaxis followed by a sum. It uses more memory but for stacks of small arrays that shouldn't matter. See the post here<http://thread.gmane.org/gmane.comp.python.numeric.general/20360/focus=21033>.

Hmm, I'm not sure I understand what is being done there.

It's just matrix multipy considered as a sum of the outer products of column and row vectors, i.e., outer product of first column in first matrix and first row in second matrix plus outer product of second column in first matrix plus second row in second matrix, etc. That form is easily adapted to stacks of matrices and is sometimes used on vector architectures, see Golub and Van Loan.

In [6]: a = ones((10,4,4))

In [7]: b = ones((10,4,4))

In [8]: sum(a[...,:,:,newaxis]*b[...,newaxis,:,:], axis=-2) Out[8]: array([[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],

[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],

[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],

[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],

[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],

[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],

[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],

[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],

[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]],

[[ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]]])

<snip>

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Emmanuel

Charles R Harris

6:09 p.m.

On Thu, Jul 15, 2010 at 12:00 PM, Emmanuel Bengio <bengioe@gmail.com> wrote:

...

Ok I get it. Thanks!

Numpy syntax that works for me: numpy.sum(a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:],axis=-2)

The leading "..." gives the same thing, but iterates over all the leading indicies in case you want multidimensional arrays of matrices ;) You can also use the sum method which might be a bit more economical: (a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:]).sum(axis=-2) How do the execution times compare? <snip> Chuck

Emmanuel Bengio

6:30 p.m.

I get about 60% of the original execution times for about any size of stack. On 15 July 2010 14:09, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Thu, Jul 15, 2010 at 12:00 PM, Emmanuel Bengio <bengioe@gmail.com>wrote:

...
Ok I get it. Thanks!

Numpy syntax that works for me: numpy.sum(a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:],axis=-2)

The leading "..." gives the same thing, but iterates over all the leading indicies in case you want multidimensional arrays of matrices ;) You can also use the sum method which might be a bit more economical:

(a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:]).sum(axis=-2)

How do the execution times compare?

<snip>

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Emmanuel

Keith Goodman

4:45 p.m.

On Thu, Jul 15, 2010 at 9:38 AM, Emmanuel Bengio <bengioe@gmail.com> wrote:

...

Hello,

I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.

I do something like this:

...
...
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))

Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline?

How about using list comprehension? And setting dot = numpy.dot. Would initializing the the c list first help?

Keith Goodman

4:54 p.m.

On Thu, Jul 15, 2010 at 9:45 AM, Keith Goodman <kwgoodman@gmail.com> wrote:

...

On Thu, Jul 15, 2010 at 9:38 AM, Emmanuel Bengio <bengioe@gmail.com> wrote:

...
Hello,

I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.

I do something like this:

...
...
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))

Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline?

How about using list comprehension? And setting dot = numpy.dot. Would initializing the the c list first help?

Doesn't buy much:

...

...
def forloop(a, b): .....: c = [] .....: for x, y in izip(a, b): .....: c.append(np.dot(x, y)) .....: return c .....: a = [np.random.rand(4,4) for i in range(10000)] b = [np.random.rand(4,4) for i in range(10000)]

timeit forloop(a, b) 10 loops, best of 3: 21.2 ms per loop

dot = np.dot timeit [dot(x, y) for x, y in izip(a, b)] 100 loops, best of 3: 19.2 ms per loop

Charles R Harris

4:58 p.m.

On Thu, Jul 15, 2010 at 10:38 AM, Emmanuel Bengio <bengioe@gmail.com> wrote:

...

Hello,

I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.

I do something like this:

...
...
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))

Is there a way to do this in one instruction? Or is there a way to do this all using weave.inline?

Yes, there is a trick for this using a multiply with properly placed newaxis followed by a sum. It uses more memory but for stacks of small arrays that shouldn't matter. See the post here<http://thread.gmane.org/gmane.comp.python.numeric.general/20360/focus=21033>. Chuck

David Warde-Farley

8:31 p.m.

On 2010-07-15, at 12:38 PM, Emmanuel Bengio <bengioe@gmail.com> wrote:

...

Hello,

I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise). Making a for loop that calculates the dot product of each is extremely slow, I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.

I do something like this:

...
...
for a,b in izip(Rot,Trans): c.append(numpy.dot(a,b))

If you need/want more speed than the solution Chuck proposed, you should check out Cython and Tokyo. Cython lets you write loops that execute at C speed, whereas Tokyo provides a Cython level wrapper for BLAS (no need to go through Python code to call NumPy). Tokyo was designed for exactly your use case: lots of matrix multiplies with relatively small matrices, where you start noticing the Python overhead. David

David Warde-Farley

10:28 p.m.

On 2010-07-15, at 4:31 PM, David Warde-Farley wrote:

...

If you need/want more speed than the solution Chuck proposed, you should check out Cython and Tokyo. Cython lets you write loops that execute at C speed, whereas Tokyo provides a Cython level wrapper for BLAS (no need to go through Python code to call NumPy). Tokyo was designed for exactly your use case: lots of matrix multiplies with relatively small matrices, where you start noticing the Python overhead.

It occurred to me I neglected to provide a link (cursed iPhone): http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-mat... David

Charles R Harris

11:44 p.m.

On Thu, Jul 15, 2010 at 4:28 PM, David Warde-Farley <dwf@cs.toronto.edu>wrote:

...

On 2010-07-15, at 4:31 PM, David Warde-Farley wrote:

...
If you need/want more speed than the solution Chuck proposed, you should check out Cython and Tokyo. Cython lets you write loops that execute at C speed, whereas Tokyo provides a Cython level wrapper for BLAS (no need to go through Python code to call NumPy). Tokyo was designed for exactly your use case: lots of matrix multiplies with relatively small matrices, where you start noticing the Python overhead.

It occurred to me I neglected to provide a link (cursed iPhone):

http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-mat...

For speed I'd go straight to c and avoid BLAS since the matrices are so small. There might also be a cache advantage to copying the non-contiguous columns of the rhs to the stack. Chuck

5280

Age (days ago)

5280

Last active (days ago)

List overview

Download

12 comments

5 participants

participants (5)

Charles R Harris
David Warde-Farley
Emmanuel Bengio
John Salvatier
Keith Goodman

Matrix dot product over an axis(for a 3d array/list of matrices)

Emmanuel Bengio

John Salvatier

Emmanuel Bengio

Charles R Harris

Emmanuel Bengio

Charles R Harris

Emmanuel Bengio

Keith Goodman

Keith Goodman

Charles R Harris

David Warde-Farley

David Warde-Farley

Charles R Harris

tags

participants (5)