![](https://secure.gravatar.com/avatar/72f994ca072df3a3d2c3db8a137790fd.jpg?s=120&d=mm&r=g)
I have made progress with resolving the issue that matmul, the operation which implements `a @ b`, is not a ufunc [2]. Discussion on the issue, which prevents the __array_ufunc__ mechanism for overriding matmul on subclasses of ndarray, yeilded two approaches: - create a wrapper that can convince the ufunc mechanism to call __array_ufunc__ even on functions that are not true ufuncs - expand the semantics of core signatures so that a single matmul ufunc can implement matrix-matrix, vector-matrix, matrix-vector, and vector-vector multiplication. I have put up prototypes of both approaches as pr 11061 [0] and 11133 [1], they are WIP to prove the concept and are a bit rough. Either approach can be made to work. Which is preferable? What are the criteria we should use to judge the relative merits (backward compatibility, efficiency, code clarity, enabling further enhancements, ...) of the approaches? Matti [0] https://github.com/numpy/numpy/pull/11061 [1] https://github.com/numpy/numpy/pull/11133 [2] https://github.com/numpy/numpy/issues/9028
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Mon, May 21, 2018 at 5:42 PM Matti Picus <matti.picus@gmail.com> wrote:
- create a wrapper that can convince the ufunc mechanism to call __array_ufunc__ even on functions that are not true ufuncs
I am somewhat opposed to this approach, because __array_ufunc__ is about overloading ufuncs, and as soon as we relax this guarantee the set of invariants __array_ufunc__ implementors rely on becomes much more limited. We really should have another mechanism for arbitrary function overloading in NumPy (NEP to follow shortly!).
I was initially concerned that adding optional dimensions for gufuncs would introduce additional complexity for only the benefit of a single function (matmul), but I'm now convinced that it makes sense: 1. All other arithmetic overloads use __array_ufunc__, and it would be nice to keep @/matmul in the same place. 2. There's a common family of gufuncs for which optional dimensions like np.matmul make sense: matrix functions where 1D arrays should be treated as 2D row- or column-vectors. One example of this class of behavior would be np.linalg.solve, which could support vectors like Ax=b and matrices like Ax=B with the signature (m,m),(m,n?)->(m,n?). We couldn't immediately make np.linalg.solve a gufunc since it uses a subtly different dispatching rule, but it's the same use-case. Another example would be the "matrix transpose" function that has been occasionally proposed, to swap the last two dimensions of an array. It could have the signature (m?,n)->(n,m?), which ensure that it is still well defined (as the identity) on 1d arrays.
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Mon, May 28, 2018 at 4:26 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
Specifically, np.linalg.solve uses a unique rule where solve(a, b) assumes that b is a stack of vectors if (a.ndim - 1 == b.ndim), and otherwise assumes that it's a stack of matrices. This is pretty confusing. You'd think that solve(a, b) should be equivalent to (inv(a) @ b), but it isn't. Say a.shape == (10, 3, 3) and b.shape == (3,). Then inv(a) @ b works, and does what you'd expect: for each of the ten 3x3 matrices in a, it computes the inverse and multiplies it by the 1-d vector in b (treated as a column vector). But solve(a, b) is an error, because the dimension aren't lined up to trigger the special handling for 1-d vectors. Or, say a.shape == (10, 3, 3) and b.shape == (3, 3). Then again inv(a) @ b works, and does what you'd expect: for each of the ten 3x3 matrices in a, it computes the inverse and multiplies it by the 3x3 matrix in b. But again solve(a, b) is an error -- this time because the special handling for 1-d vectors *does* kick in, even though it doesn't make sense: it tries to match up the ten 3x3 matrices in a against the three one-dimensional vectors in b, and 10 != 3 so the broadcasting fails. This also points to even more confusing possibilities: if a.shape == (3, 3) or (3, 3, 3, 3) and b.shape == (3, 3), then inv(a) @ b and solve(a, b) both work and do the same thing. But if a.shape == (3, 3, 3), then inv(a) @ b and solve(a, b) both work, and do totally *different* things. I wonder if we should deprecate these corner cases, and eventually migrate to making inv(a) @ b and solve(a, b) the same in all situations. If we did, then solve(a, b) would actually be a gufunc with signature (m,m),(m,n?)->(m,n?). I think the cases that would need to be changed are those where (a.ndim - 1 == b.ndim and b.ndim > 1). My guess is that this happens very rarely in existing code, especially since (IIRC) this behavior was only added a few years ago, when we gufunc-ified numpy.linalg.
Unfortunately I don't think we could make "broadcasting matrix transpose" be literally a gufunc, since it should return a view. But I guess there'd still be some value in having the notation available just when talking about it, so we could say "this operation is *like* a gufunc with signature (m?,n)->(n,m?), except that it returns a view". -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
which ensure that it is still well defined (as the identity) on 1d arrays. This strikes me as a bad idea. There’s already enough confusion from beginners that array_1d.T is a no-op. If we introduce a matrix-transpose, it should either error on <1d inputs with a useful message, or insert the extra dimension. I’d favor the former. Eric On Mon, 28 May 2018 at 16:27 Stephan Hoyer shoyer@gmail.com <http://mailto:shoyer@gmail.com> wrote: On Mon, May 21, 2018 at 5:42 PM Matti Picus <matti.picus@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Mon, May 28, 2018 at 7:36 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
To be clear: matrix transpose is an example use-case rather than a serious proposal in this discussion. But given that idiomatic NumPy code uses 1D arrays in favor of explicit row/column vectors with shapes (1,n) and (n,1), I do think it does make sense for matrix transpose on 1D arrays to be the identity, because matrix transpose should convert back and forth between row and column vectors representations. Certainly, matrix transpose should error on 0d arrays, because it doesn't make sense to transpose a scalar.
![](https://secure.gravatar.com/avatar/03f2d50ce2e8d713af6058d2aeafab74.jpg?s=120&d=mm&r=g)
On Tue, May 29, 2018 at 5:40 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
Apologies for the probably academic nitpick, but if idiomatic code uses 1d arrays as vectors then shouldn't scalars be compatible with matrices with dimension (in the mathematical sense) of 1? Since the matrix product of shapes (1,n) and (n,1) is (1,1) but the same for shapes (n,) and (n,) is (), it might make sense after all for the matrix transpose to be identity for scalars. I'm aware that this is tangential to the primary discussion, but I'm also wondering if I'm being confused about the subject (wouldn't be the first time that I got confused about numpy scalars). András
![](https://secure.gravatar.com/avatar/18a30ce6d84de6ce5c11ce006d10f616.jpg?s=120&d=mm&r=g)
On 29 May 2018 at 05:40, Stephan Hoyer <shoyer@gmail.com> wrote:
When doing algebra on paper, I like the braket notation. It makes abundantly clear the shape of the outputs, without having to remember on which side the transpose goes: <u|v> is a scalar, |u><v| is a matrix. I don't have a good way of translating this to numpy, but maybe someone else has an idea. Certainly, matrix transpose should error on 0d arrays, because it doesn't
make sense to transpose a scalar.
Unless the scalar is 8, in which case the transpose is np.inf... Right now, np.int(8).T throws an error, but np.transpose(np.int(8)) gives a 0-d array. On one hand, it is nice to be able to use the same code for scalars as for vectors, but on the other, you may be making a mistake. /David.
![](https://secure.gravatar.com/avatar/03f2d50ce2e8d713af6058d2aeafab74.jpg?s=120&d=mm&r=g)
On Tue, May 29, 2018 at 12:16 PM, Daπid <davidmenhur@gmail.com> wrote:
Right now, np.int(8).T throws an error, but np.transpose(np.int(8)) gives a 0-d array. On one hand, it is nice to be able to use the same code for
`np.int` is just python `int`! What you mean is `np.int64(8).T` which works fine, so does `np.array(8).T`.
![](https://secure.gravatar.com/avatar/81e62cb212edf2a8402c842b120d9f31.jpg?s=120&d=mm&r=g)
Apart from the math-validity discussion, in my experience errors are used a bit too generously in the not-allowed ops. No ops are fine once you learn more about them such as transpose on 1D arrays (good or bad is another discussion). But raising errors bloat the computational code too much. "Is it a scalar oh then do this is it 1D oh make this one is it 2D then do something else." type of coding is really making life difficult. Most of my time in the numerical code is spent on trying to catch scalars and 1D arrays and writing exceptions because I can't predict what the user would do or what the result should be after certain operations. Quite unwillingly, I've started making everything 2D whether it is required or not because then I can just avoid the following np.eye(4)[:, 1] # 1d np.eye(4)[:, 1:2] #2d np.eye(4)[:, [1]] #2d np.eye(4)[:, [1]] @ 5 # Error np.eye(4)[:, [1]] @ np.array(5) #Error np.eye(4)[:, [1]] @ np.array([5]) # Result is 1D np.eye(4)[:, [1]] @ np.array([[5]]) # Result 2D So imagine I'm trying to get a simple multiply_these function, I have already quite some cases to consider such that the function is "Pythonic". If the second argument is int,float do *-mult, if it is a numpy array but has no dimensions then again *-mult but if it is 1d keep dims and also if it is 2d do @-mult. Add broadcasting rules on top of this and it gets a pretty wordy function.Hence, what I would suggest is to also include the use cases while deciding the behavior of a single functionality. So indeed it doesn't make sense to transpose 0d array but as an array object now it would start to have a lot of Wat! moments. https://www.destroyallsoftware.com/talks/wat On Tue, May 29, 2018 at 12:51 PM, Andras Deak <deak.andris@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Mon, May 28, 2018, 20:41 Stephan Hoyer <shoyer@gmail.com> wrote:
More concretely, I think the idea is that if you write code like a.T @ a then it's nice if that automatically works for both 2d and 1d arrays. Especially, say, if this is embedded inside a larger function so you have some responsibilities to you users to handle different inputs appropriately, and your users expect that to include both 2d matrices and 1d vectors. It reduces special cases. But, on the other hand, if you write a @ a.T then you'll be in for a surprise... So maybe it's not a great idea after all. (Note that here I'm using .T as a placeholder for a hypothetical "broadcasting matrix transpose". I don't think anyone proposes that .T itself should be changed to do this; I just needed some notation.) -n
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Mon, May 21, 2018 at 5:42 PM Matti Picus <matti.picus@gmail.com> wrote:
- create a wrapper that can convince the ufunc mechanism to call __array_ufunc__ even on functions that are not true ufuncs
I am somewhat opposed to this approach, because __array_ufunc__ is about overloading ufuncs, and as soon as we relax this guarantee the set of invariants __array_ufunc__ implementors rely on becomes much more limited. We really should have another mechanism for arbitrary function overloading in NumPy (NEP to follow shortly!).
I was initially concerned that adding optional dimensions for gufuncs would introduce additional complexity for only the benefit of a single function (matmul), but I'm now convinced that it makes sense: 1. All other arithmetic overloads use __array_ufunc__, and it would be nice to keep @/matmul in the same place. 2. There's a common family of gufuncs for which optional dimensions like np.matmul make sense: matrix functions where 1D arrays should be treated as 2D row- or column-vectors. One example of this class of behavior would be np.linalg.solve, which could support vectors like Ax=b and matrices like Ax=B with the signature (m,m),(m,n?)->(m,n?). We couldn't immediately make np.linalg.solve a gufunc since it uses a subtly different dispatching rule, but it's the same use-case. Another example would be the "matrix transpose" function that has been occasionally proposed, to swap the last two dimensions of an array. It could have the signature (m?,n)->(n,m?), which ensure that it is still well defined (as the identity) on 1d arrays.
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Mon, May 28, 2018 at 4:26 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
Specifically, np.linalg.solve uses a unique rule where solve(a, b) assumes that b is a stack of vectors if (a.ndim - 1 == b.ndim), and otherwise assumes that it's a stack of matrices. This is pretty confusing. You'd think that solve(a, b) should be equivalent to (inv(a) @ b), but it isn't. Say a.shape == (10, 3, 3) and b.shape == (3,). Then inv(a) @ b works, and does what you'd expect: for each of the ten 3x3 matrices in a, it computes the inverse and multiplies it by the 1-d vector in b (treated as a column vector). But solve(a, b) is an error, because the dimension aren't lined up to trigger the special handling for 1-d vectors. Or, say a.shape == (10, 3, 3) and b.shape == (3, 3). Then again inv(a) @ b works, and does what you'd expect: for each of the ten 3x3 matrices in a, it computes the inverse and multiplies it by the 3x3 matrix in b. But again solve(a, b) is an error -- this time because the special handling for 1-d vectors *does* kick in, even though it doesn't make sense: it tries to match up the ten 3x3 matrices in a against the three one-dimensional vectors in b, and 10 != 3 so the broadcasting fails. This also points to even more confusing possibilities: if a.shape == (3, 3) or (3, 3, 3, 3) and b.shape == (3, 3), then inv(a) @ b and solve(a, b) both work and do the same thing. But if a.shape == (3, 3, 3), then inv(a) @ b and solve(a, b) both work, and do totally *different* things. I wonder if we should deprecate these corner cases, and eventually migrate to making inv(a) @ b and solve(a, b) the same in all situations. If we did, then solve(a, b) would actually be a gufunc with signature (m,m),(m,n?)->(m,n?). I think the cases that would need to be changed are those where (a.ndim - 1 == b.ndim and b.ndim > 1). My guess is that this happens very rarely in existing code, especially since (IIRC) this behavior was only added a few years ago, when we gufunc-ified numpy.linalg.
Unfortunately I don't think we could make "broadcasting matrix transpose" be literally a gufunc, since it should return a view. But I guess there'd still be some value in having the notation available just when talking about it, so we could say "this operation is *like* a gufunc with signature (m?,n)->(n,m?), except that it returns a view". -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
which ensure that it is still well defined (as the identity) on 1d arrays. This strikes me as a bad idea. There’s already enough confusion from beginners that array_1d.T is a no-op. If we introduce a matrix-transpose, it should either error on <1d inputs with a useful message, or insert the extra dimension. I’d favor the former. Eric On Mon, 28 May 2018 at 16:27 Stephan Hoyer shoyer@gmail.com <http://mailto:shoyer@gmail.com> wrote: On Mon, May 21, 2018 at 5:42 PM Matti Picus <matti.picus@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Mon, May 28, 2018 at 7:36 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
To be clear: matrix transpose is an example use-case rather than a serious proposal in this discussion. But given that idiomatic NumPy code uses 1D arrays in favor of explicit row/column vectors with shapes (1,n) and (n,1), I do think it does make sense for matrix transpose on 1D arrays to be the identity, because matrix transpose should convert back and forth between row and column vectors representations. Certainly, matrix transpose should error on 0d arrays, because it doesn't make sense to transpose a scalar.
![](https://secure.gravatar.com/avatar/03f2d50ce2e8d713af6058d2aeafab74.jpg?s=120&d=mm&r=g)
On Tue, May 29, 2018 at 5:40 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
Apologies for the probably academic nitpick, but if idiomatic code uses 1d arrays as vectors then shouldn't scalars be compatible with matrices with dimension (in the mathematical sense) of 1? Since the matrix product of shapes (1,n) and (n,1) is (1,1) but the same for shapes (n,) and (n,) is (), it might make sense after all for the matrix transpose to be identity for scalars. I'm aware that this is tangential to the primary discussion, but I'm also wondering if I'm being confused about the subject (wouldn't be the first time that I got confused about numpy scalars). András
![](https://secure.gravatar.com/avatar/18a30ce6d84de6ce5c11ce006d10f616.jpg?s=120&d=mm&r=g)
On 29 May 2018 at 05:40, Stephan Hoyer <shoyer@gmail.com> wrote:
When doing algebra on paper, I like the braket notation. It makes abundantly clear the shape of the outputs, without having to remember on which side the transpose goes: <u|v> is a scalar, |u><v| is a matrix. I don't have a good way of translating this to numpy, but maybe someone else has an idea. Certainly, matrix transpose should error on 0d arrays, because it doesn't
make sense to transpose a scalar.
Unless the scalar is 8, in which case the transpose is np.inf... Right now, np.int(8).T throws an error, but np.transpose(np.int(8)) gives a 0-d array. On one hand, it is nice to be able to use the same code for scalars as for vectors, but on the other, you may be making a mistake. /David.
![](https://secure.gravatar.com/avatar/03f2d50ce2e8d713af6058d2aeafab74.jpg?s=120&d=mm&r=g)
On Tue, May 29, 2018 at 12:16 PM, Daπid <davidmenhur@gmail.com> wrote:
Right now, np.int(8).T throws an error, but np.transpose(np.int(8)) gives a 0-d array. On one hand, it is nice to be able to use the same code for
`np.int` is just python `int`! What you mean is `np.int64(8).T` which works fine, so does `np.array(8).T`.
![](https://secure.gravatar.com/avatar/81e62cb212edf2a8402c842b120d9f31.jpg?s=120&d=mm&r=g)
Apart from the math-validity discussion, in my experience errors are used a bit too generously in the not-allowed ops. No ops are fine once you learn more about them such as transpose on 1D arrays (good or bad is another discussion). But raising errors bloat the computational code too much. "Is it a scalar oh then do this is it 1D oh make this one is it 2D then do something else." type of coding is really making life difficult. Most of my time in the numerical code is spent on trying to catch scalars and 1D arrays and writing exceptions because I can't predict what the user would do or what the result should be after certain operations. Quite unwillingly, I've started making everything 2D whether it is required or not because then I can just avoid the following np.eye(4)[:, 1] # 1d np.eye(4)[:, 1:2] #2d np.eye(4)[:, [1]] #2d np.eye(4)[:, [1]] @ 5 # Error np.eye(4)[:, [1]] @ np.array(5) #Error np.eye(4)[:, [1]] @ np.array([5]) # Result is 1D np.eye(4)[:, [1]] @ np.array([[5]]) # Result 2D So imagine I'm trying to get a simple multiply_these function, I have already quite some cases to consider such that the function is "Pythonic". If the second argument is int,float do *-mult, if it is a numpy array but has no dimensions then again *-mult but if it is 1d keep dims and also if it is 2d do @-mult. Add broadcasting rules on top of this and it gets a pretty wordy function.Hence, what I would suggest is to also include the use cases while deciding the behavior of a single functionality. So indeed it doesn't make sense to transpose 0d array but as an array object now it would start to have a lot of Wat! moments. https://www.destroyallsoftware.com/talks/wat On Tue, May 29, 2018 at 12:51 PM, Andras Deak <deak.andris@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Mon, May 28, 2018, 20:41 Stephan Hoyer <shoyer@gmail.com> wrote:
More concretely, I think the idea is that if you write code like a.T @ a then it's nice if that automatically works for both 2d and 1d arrays. Especially, say, if this is embedded inside a larger function so you have some responsibilities to you users to handle different inputs appropriately, and your users expect that to include both 2d matrices and 1d vectors. It reduces special cases. But, on the other hand, if you write a @ a.T then you'll be in for a surprise... So maybe it's not a great idea after all. (Note that here I'm using .T as a placeholder for a hypothetical "broadcasting matrix transpose". I don't think anyone proposes that .T itself should be changed to do this; I just needed some notation.) -n
participants (7)
-
Andras Deak
-
Daπid
-
Eric Wieser
-
Ilhan Polat
-
Matti Picus
-
Nathaniel Smith
-
Stephan Hoyer