[Numpy-discussion] matmul as a ufunc

Mon May 28 20:11:25 EDT 2018

On Mon, May 28, 2018 at 4:26 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
> On Mon, May 21, 2018 at 5:42 PM Matti Picus <matti.picus at gmail.com> wrote:
>>
>> - create a wrapper that can convince the ufunc mechanism to call
>> __array_ufunc__ even on functions that are not true ufuncs
>
>
> I am somewhat opposed to this approach, because __array_ufunc__ is about
> overloading ufuncs, and as soon as we relax this guarantee the set of
> invariants __array_ufunc__ implementors rely on becomes much more limited.
>
> We really should have another mechanism for arbitrary function overloading
> in NumPy (NEP to follow shortly!).
>
>>
>> - expand the semantics of core signatures so that a single matmul ufunc
>> can implement matrix-matrix, vector-matrix, matrix-vector, and
>> vector-vector multiplication.
>
>
> I was initially concerned that adding optional dimensions for gufuncs would
> introduce additional complexity for only the benefit of a single function
> (matmul), but I'm now convinced that it makes sense:
> 1. All other arithmetic overloads use __array_ufunc__, and it would be nice
> to keep @/matmul in the same place.
> 2. There's a common family of gufuncs for which optional dimensions like
> np.matmul make sense: matrix functions where 1D arrays should be treated as
> 2D row- or column-vectors.
>
> One example of this class of behavior would be np.linalg.solve, which could
> support vectors like Ax=b and matrices like Ax=B with the signature
> (m,m),(m,n?)->(m,n?). We couldn't immediately make np.linalg.solve a gufunc
> since it uses a subtly different dispatching rule, but it's the same
> use-case.

Specifically, np.linalg.solve uses a unique rule where

   solve(a, b)

assumes that b is a stack of vectors if (a.ndim - 1 == b.ndim), and
otherwise assumes that it's a stack of matrices. This is pretty
confusing. You'd think that solve(a, b) should be equivalent to
(inv(a) @ b), but it isn't.

Say a.shape == (10, 3, 3) and b.shape == (3,). Then inv(a) @ b works,
and does what you'd expect: for each of the ten 3x3 matrices in a, it
computes the inverse and multiplies it by the 1-d vector in b (treated
as a column vector). But solve(a, b) is an error, because the
dimension aren't lined up to trigger the special handling for 1-d
vectors.

Or, say a.shape == (10, 3, 3) and b.shape == (3, 3). Then again inv(a)
@ b works, and does what you'd expect: for each of the ten 3x3
matrices in a, it computes the inverse and multiplies it by the 3x3
matrix in b. But again solve(a, b) is an error -- this time because
the special handling for 1-d vectors *does* kick in, even though it
doesn't make sense: it tries to match up the ten 3x3 matrices in a
against the three one-dimensional vectors in b, and 10 != 3 so the
broadcasting fails.

This also points to even more confusing possibilities: if a.shape ==
(3, 3) or (3, 3, 3, 3) and b.shape == (3, 3), then inv(a) @ b and
solve(a, b) both work and do the same thing. But if a.shape == (3, 3,
3), then inv(a) @ b and solve(a, b) both work, and do totally
*different* things.

I wonder if we should deprecate these corner cases, and eventually
migrate to making inv(a) @ b and solve(a, b) the same in all
situations. If we did, then solve(a, b) would actually be a gufunc
with signature (m,m),(m,n?)->(m,n?).

I think the cases that would need to be changed are those where
(a.ndim - 1 == b.ndim and b.ndim > 1). My guess is that this happens
very rarely in existing code, especially since (IIRC) this behavior
was only added a few years ago, when we gufunc-ified numpy.linalg.

> Another example would be the "matrix transpose" function that has been
> occasionally proposed, to swap the last two dimensions of an array. It could
> have the signature (m?,n)->(n,m?), which ensure that it is still well
> defined (as the identity) on 1d arrays.

Unfortunately I don't think we could make "broadcasting matrix
transpose" be literally a gufunc, since it should return a view. But I
guess there'd still be some value in having the notation available
just when talking about it, so we could say "this operation is *like*
a gufunc with signature (m?,n)->(n,m?), except that it returns a
view".

-n

-- 
Nathaniel J. Smith -- https://vorpus.org