Extending ufunc signature syntax for matmul, frozen dimensions
In looking to solve issue #9028 "no way to override matmul/@ if __array_ufunc__ is set", it seems there is consensus around the idea of making matmul a true gufunc, but matmul can behave differently for different combinations of array and vector: (n,k),(k,m)->(n,m) (n,k),(k) -> (n) (k),(k,m)->(m) Currently there is no way to express that in the ufunc signature. The proposed solution to issue #9029 is to extend the meaning of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are optional dimensions; if missing in the input, they're treated as 1, and then dropped from the output" Additionally, there is an open pull request #5015 "Add frozen dimensions to gufunc signatures" to allow signatures like '(3),(3)->(3)'. I would like extending ufunc signature handling to implement both these ideas, in a way that would be backwardly-compatible with the publicly exposed PyUFuncObject. PyUFunc_FromFuncAndDataAndSignature is used to allocate and initialize a PyUFuncObject, are there downstream projects that allocate their own PyUFuncObject not via PyUFunc_FromFuncAndDataAndSignature? If so, we could use one of the "reserved" fields, or extend the meaning of the "identity" field to allow version detection. Any thoughts? Any other thoughts about extending the signature syntax? Thanks, Matti
Hi Matti, This sounds great. For completeness, you omitted the vector-vector case for matmul '(k),(k)->()' - but the suggested new signature for `matmul` would cover that case as well, so not a problem. All the best, Marten
I thought a bit further about this proposal: a disadvantage for matmul specifically is that is does not solve the need for `matvec`, `vecmat`, and `vecvec` gufuncs. That said, it might make sense to implement those as "pseudo-ufuncs" that just add a 1 in the right place and call `matmul`... -- Marten
On Sun, Apr 29, 2018 at 2:48 AM Matti Picus <matti.picus@gmail.com> wrote:
The proposed solution to issue #9029 is to extend the meaning of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are optional dimensions; if missing in the input, they're treated as 1, and then dropped from the output"
I agree that this is an elegant fix for matmul, but are there other use-cases for "optional dimensions" in gufuncs? It feels a little wrong to add gufunc features if we can only think of one function that can use them.
I think I’m -1 on this - this just makes things harder on the implementers of _array_ufunc__ who now might have to work out which signature matches. I’d prefer the solution where np.matmul is a wrapper around one of three gufuncs (or maybe just around one with axis insertion) - this is similar to how np.linalg already works. Eric On Mon, 30 Apr 2018 at 14:34 Stephan Hoyer <shoyer@gmail.com> wrote:
On Sun, Apr 29, 2018 at 2:48 AM Matti Picus <matti.picus@gmail.com> wrote:
The proposed solution to issue #9029 is to extend the meaning of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are optional dimensions; if missing in the input, they're treated as 1, and then dropped from the output"
I agree that this is an elegant fix for matmul, but are there other use-cases for "optional dimensions" in gufuncs?
It feels a little wrong to add gufunc features if we can only think of one function that can use them. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
I agree with Eric here. As one of the users of __array_ufunc__, I'd much rather have three separate gufuncs or a single one with axis insertion and removal. On 30/04/2018 at 23:38, Eric wrote: I think I’m -1 on this - this just makes things harder on the implementers of _array_ufunc__ who now might have to work out which signature matches. I’d prefer the solution where np.matmul is a wrapper around one of three gufuncs (or maybe just around one with axis insertion) - this is similar to how np.linalg already works. Eric On Mon, 30 Apr 2018 at 14:34 Stephan Hoyer <shoyer@gmail.com> wrote: On Sun, Apr 29, 2018 at 2:48 AM Matti Picus <matti.picus@gmail.com> wrote: The proposed solution to issue #9029 is to extend the meaning of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are optional dimensions; if missing in the input, they're treated as 1, and then dropped from the output" I agree that this is an elegant fix for matmul, but are there other use-cases for "optional dimensions" in gufuncs? It feels a little wrong to add gufunc features if we can only think of one function that can use them. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 01/05/18 00:38, Eric Wieser wrote:
I think I’m -1 on this - this just makes things harder on the implementers of |_array_ufunc__| who now might have to work out which signature matches. I’d prefer the solution where |np.matmul| is a wrapper around one of three gufuncs (or maybe just around one with axis insertion) - this is similar to how np.linalg already works.
Eric
On Mon, 30 Apr 2018 at 14:34 Stephan Hoyer <shoyer@gmail.com <mailto:shoyer@gmail.com>> wrote:
On Sun, Apr 29, 2018 at 2:48 AM Matti Picus <matti.picus@gmail.com <mailto:matti.picus@gmail.com>> wrote:
The proposed solution to issue #9029 is to extend the meaning of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are optional dimensions; if missing in the input, they're treated as 1, and then dropped from the output"
I agree that this is an elegant fix for matmul, but are there other use-cases for "optional dimensions" in gufuncs?
It feels a little wrong to add gufunc features if we can only think of one function that can use them. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
I will try to prototype this solution and put it up for comment, alongside the multi-signature one. Matti
Just for completeness: there are *four* gufuncs (matmat, matvec, vecmat, and vecvec). I remain torn about the best way forward. The main argument against using them inside matmul is that in order to decide which of the four to use, matmul has to have access to the `shape` of the arguments. This meants that means that `__array_ufunc__` cannot be used to override `matmul` (or `@`) for any object which does not have a shape.
From that perspective, multiple signatures is definitely a more elegant solution.
An advantage of the separate solution is that they are useful independently of whether they are used internally in `matmul`; though, then again, with a multi-signature matmul, these would be trivially created as convenience functions. -- Marten
There is always the option of any downstream object overriding matmul, and I fail to see which objects won't have a shape. - Hameer On 01/05/2018 at 21:08, Marten wrote: Just for completeness: there are *four* gufuncs (matmat, matvec, vecmat, and vecvec). I remain torn about the best way forward. The main argument against using them inside matmul is that in order to decide which of the four to use, matmul has to have access to the `shape` of the arguments. This meants that means that `__array_ufunc__` cannot be used to override `matmul` (or `@`) for any object which does not have a shape.
From that perspective, multiple signatures is definitely a more elegant solution.
An advantage of the separate solution is that they are useful independently of whether they are used internally in `matmul`; though, then again, with a multi-signature matmul, these would be trivially created as convenience functions. -- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, May 2, 2018 at 6:24 AM, Hameer Abbasi <einstein.edison@gmail.com> wrote:
There is always the option of any downstream object overriding matmul, and I fail to see which objects won't have a shape. - Hameer
I think we should not decide too readily on what is "reasonable" to expect for a ufunc input. For instance, I'm currently writing a chained-ufunc class which uses __array_ufunc__ to help make a chain (something like `chained_ufunc = np.sin(np.multiply(Input(), Input()))`). Here, my `Input` class defines `__array_ufunc__` but definitely does not have a shape, and I would like to be able to override `np.matmul` just like every other ufunc. -- Marten
On Wed, May 2, 2018 at 8:39 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I think we should not decide too readily on what is "reasonable" to expect for a ufunc input.
I agree strongly with this. I can think of a couple of other use-cases off hand: - xarray.Dataset is a dict-like container of multiple arrays. Matrix-multiplication with a numpy array could make sense (just map over all the contained arrays), but xarray.Dataset itself is not an array and thus does not define shape. - tensorflow.Tensor can have a dynamic shape that is only known when computation is explicitly run, not when computation is defined in Python. The problem is even bigger for np.matmul because NumPy also wants to use the same logic for overriding @, and Python's built-in operators definitely should not have such restrictions.
On 01/05/18 21:08, Marten van Kerkwijk wrote:
Just for completeness: there are *four* gufuncs (matmat, matvec, vecmat, and vecvec).
I remain torn about the best way forward. The main argument against using them inside matmul is that in order to decide which of the four to use, matmul has to have access to the `shape` of the arguments. This meants that means that `__array_ufunc__` cannot be used to override `matmul` (or `@`) for any object which does not have a shape. From that perspective, multiple signatures is definitely a more elegant solution.
An advantage of the separate solution is that they are useful independently of whether they are used internally in `matmul`; though, then again, with a multi-signature matmul, these would be trivially created as convenience functions.
-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion My goal is to solve issue #9028, "no way to override matmul/@ if __array_ufunc__ is set on other". Maybe I am too focused on that, it seems shape does not come into play here.
Given a call to matmul(self, other) it appears to me that the decision to commit to self.matmul or to call other.__array_ufunc__("__call__", self.matmul, ...) is independent of the shapes and needs only nin and nout. In other words, the implementation of matmul becomes (simplified): (matmul(self, other) called)-> (use __array_ufunc__ and nin and nout to decide whether to defer to other's __array_ufunc__ via PyUFunc_CheckOverride which implements NEP13) -> (yes: call other.__array_ufunc__ as for any other ufunc), (no: call matmul like we currently do, no more __aray__ufunc__ testing needed) So the two avenues I am trying are 1) make matmul a gufunc and then it will automatically use the __array_ufunc__ machinery without any added changes, but this requires expanding the meaning of a signature to allow dispatch 2) generalize the __array_ufunc__ machinery to handle some kind of wrapped function, the wrapper knows about nin and nout and calls PyUFunc_CheckOverride, which would allow matmul to work unchanged and might support other functions as well. The issue of whether matmat, vecmat, matvec, and vecvec are functions, gufuncs accessible from userland, or not defined at all is secondary to the current issue of overriding matmul , we can decide that in the future. If we do create ufuncs for these variants, calling a.vecmat(other) for instance will still resolve to other's __array_ufunc__ without needing to explore other's shape. I probably misunderstood what you were driving at because I am so focused on this particular issue. Matti
Hi Matti, In the original implementation of what was then __numpy_ufunc__, we had overrides for both `np.dot` and `np.matmul` that worked exactly as your option (2), but we decided in the end that those really are not true ufuncs and we should not include ufunc mimics in the mix as someone using `__array_ufunc__` should be able to count on being passed a ufunc, including all its properties. Perhaps this needs revisiting, and we should have some UFuncABC... But my own feeling remains that matmul is close enough to a (set of) gufunc that making it fit the gufunc mold is the way to go... All the best, Marten
On 04/29/2018 05:46 AM, Matti Picus wrote:
In looking to solve issue #9028 "no way to override matmul/@ if __array_ufunc__ is set", it seems there is consensus around the idea of making matmul a true gufunc, but matmul can behave differently for different combinations of array and vector:
(n,k),(k,m)->(n,m) (n,k),(k) -> (n) (k),(k,m)->(m)
Currently there is no way to express that in the ufunc signature. The proposed solution to issue #9029 is to extend the meaning of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are optional dimensions; if missing in the input, they're treated as 1, and then dropped from the output" Additionally, there is an open pull request #5015 "Add frozen dimensions to gufunc signatures" to allow signatures like '(3),(3)->(3)'.
How much harder would it be to implement multiple-dispatch for gufunc signatures, instead of modifying the signature to include `?` ? There was some discussion of this last year: http://numpy-discussion.10968.n7.nabble.com/Changes-to-generalized-ufunc-cor... That sounded like a clean solution to me, although I'm a bit ignorant of the gufunc internals and the compatibility constraints. I assume gufuncs already have code to match the signature to the array dims, so it sounds fairly straightforward (I say without looking at any code) to do this in a loop over alternate signatures until one works. Allan
On 01/05/18 01:45, Allan Haldane wrote:
On 04/29/2018 05:46 AM, Matti Picus wrote:
In looking to solve issue #9028 "no way to override matmul/@ if __array_ufunc__ is set", it seems there is consensus around the idea of making matmul a true gufunc, but matmul can behave differently for different combinations of array and vector:
(n,k),(k,m)->(n,m) (n,k),(k) -> (n) (k),(k,m)->(m)
Currently there is no way to express that in the ufunc signature. The proposed solution to issue #9029 is to extend the meaning of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are optional dimensions; if missing in the input, they're treated as 1, and then dropped from the output" Additionally, there is an open pull request #5015 "Add frozen dimensions to gufunc signatures" to allow signatures like '(3),(3)->(3)'. How much harder would it be to implement multiple-dispatch for gufunc signatures, instead of modifying the signature to include `?` ?
There was some discussion of this last year:
http://numpy-discussion.10968.n7.nabble.com/Changes-to-generalized-ufunc-cor...
That sounded like a clean solution to me, although I'm a bit ignorant of the gufunc internals and the compatibility constraints.
I assume gufuncs already have code to match the signature to the array dims, so it sounds fairly straightforward (I say without looking at any code) to do this in a loop over alternate signatures until one works.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion I will take a look at multiple-dispatch for gufuncs. The discussion also suggests using an axis kwarg when calling a gufunc for which there is PR #1108 (https://github.com/numpy/numpy/pull/11018) discussion).
Matti
participants (6)
-
Allan Haldane
-
Eric Wieser
-
Hameer Abbasi
-
Marten van Kerkwijk
-
Matti Picus
-
Stephan Hoyer