matmul needs some clarification.

Hi All, The problem arises when multiplying a stack of matrices times a vector. PEP465 defines this as appending a '1' to the dimensions of the vector and doing the defined stacked matrix multiply, then removing the last dimension from the result. Note that in the middle step we have a stack of matrices and after removing the last dimension we will still have a stack of matrices. What we want is a stack of vectors, but we can't have those with our conventions. This makes the result somewhat unexpected. How should we resolve this? Chuck

On Sat, May 30, 2015 at 6:23 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
The problem arises when multiplying a stack of matrices times a vector. PEP465 defines this as appending a '1' to the dimensions of the vector and doing the defined stacked matrix multiply, then removing the last dimension from the result. Note that in the middle step we have a stack of matrices and after removing the last dimension we will still have a stack of matrices. What we want is a stack of vectors, but we can't have those with our conventions. This makes the result somewhat unexpected. How should we resolve this?
I think that before tackling the @ operator, we should implement the pure dot of stacks of matrices and dot of stacks of vectors generalized ufuncs. The first will have a 2d "core" and the second - 1d. Let's tentatively call them matmul and vecmul. Hopefully matrix vector product can be reduced to the vecmul, but I have not fully figured this out. If not - we may need the third ufunc. Once we have these ufuncs, we can decide what @ operator should do in terms of them and possibly some axes manipulation.

On Wed, Jun 3, 2015 at 1:38 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
On Sat, May 30, 2015 at 6:23 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
The problem arises when multiplying a stack of matrices times a vector. PEP465 defines this as appending a '1' to the dimensions of the vector and doing the defined stacked matrix multiply, then removing the last dimension from the result. Note that in the middle step we have a stack of matrices and after removing the last dimension we will still have a stack of matrices. What we want is a stack of vectors, but we can't have those with our conventions. This makes the result somewhat unexpected. How should we resolve this?
I think that before tackling the @ operator, we should implement the pure dot of stacks of matrices and dot of stacks of vectors generalized ufuncs. The first will have a 2d "core" and the second - 1d. Let's tentatively call them matmul and vecmul. Hopefully matrix vector product can be reduced to the vecmul, but I have not fully figured this out. If not - we may need the third ufunc.
The `@` operator is done. I originally started with four ufuncs, mulvecvec, mulmatvec, etc, but decided to wait on that until we merged the ufunc and multiarray packages and did some other ufunc work. The matmul function can certainly be upgraded in the future, but is as good as dot right now except it doesn't handle object arrays.
Once we have these ufuncs, we can decide what @ operator should do in terms of them and possibly some axes manipulation.
Chuck

On Thu, Jun 4, 2015 at 6:50 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
On Wed, Jun 3, 2015 at 5:12 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
but is as good as dot right now except it doesn't handle object arrays.
This is a fairly low standard. :-(
Meaning as fast. I expect ufuncs to have more call overhead and they need to use blas to be competitive for float et al. Chuck

On Sat, May 30, 2015 at 3:23 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
The problem arises when multiplying a stack of matrices times a vector. PEP465 defines this as appending a '1' to the dimensions of the vector and doing the defined stacked matrix multiply, then removing the last dimension from the result. Note that in the middle step we have a stack of matrices and after removing the last dimension we will still have a stack of matrices. What we want is a stack of vectors, but we can't have those with our conventions. This makes the result somewhat unexpected. How should we resolve this?
I'm afraid I don't quite understand the issue. Maybe a more specific example of the shapes you have in mind would help? Here's my attempt. Suppose we have two arrays: a with shape (i, j, k) b with shape (k,) Following the logic you describe from PEP465, for a @ b we have shapes transform like so: (i, j, k,) @ (k, 1) -> (i, j, 1) -> (i, j) This makes sense to me as a stack of vectors, as long as you are imagining the original stack of matrices as along the first dimension. Which I'll note is the default behavior for the new np.stack ( https://github.com/numpy/numpy/pull/5605).

On Wed, Jun 3, 2015 at 2:25 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
On Sat, May 30, 2015 at 3:23 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
The problem arises when multiplying a stack of matrices times a vector. PEP465 defines this as appending a '1' to the dimensions of the vector and doing the defined stacked matrix multiply, then removing the last dimension from the result. Note that in the middle step we have a stack of matrices and after removing the last dimension we will still have a stack of matrices. What we want is a stack of vectors, but we can't have those with our conventions. This makes the result somewhat unexpected. How should we resolve this?
I'm afraid I don't quite understand the issue. Maybe a more specific example of the shapes you have in mind would help? Here's my attempt.
Suppose we have two arrays: a with shape (i, j, k) b with shape (k,)
Following the logic you describe from PEP465, for a @ b we have shapes transform like so: (i, j, k,) @ (k, 1) -> (i, j, 1) -> (i, j)
This makes sense to me as a stack of vectors, as long as you are imagining the original stack of matrices as along the first dimension. Which I'll note is the default behavior for the new np.stack ( https://github.com/numpy/numpy/pull/5605).
Yes, you end up with a stack of vectors, but matmul will interpret them as a stack of matrices. I decided there is nothing to be done there and just documented it as a potential gotcha. The other possibility would be to prohibit or warn on stacked matrices and vectors for the `@` operator and that might limit what some folks want to do. Chuck
participants (3)
-
Alexander Belopolsky
-
Charles R Harris
-
Stephan Hoyer