Allowing broadcasting of code dimensions in generalized ufuncs
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi All, Following on a PR combining the ability to provide fixed and flexible dimensions [1] (useful for, e.g., 3-vector input with a signature like `(3),(3)->(3)`, and for `matmul`, resp.; based on earlier PRs by Jaime [2] and Matt (Picus) [3]), I've now made a PR with a further enhancement, which allows one can indicate that a core dimension can be broadcast [4]. A particular use case is `all_equal`, a new function suggested in a stalled PR by Matt (Harrigan) [5], which compares two arrays axis-by-axis, but short-circuits if a non-equality is found (unlike what is the case if one does `(a==b).all(axis)`). One thing that would be obviously useful for a routine like `all_equal` is to be able to provide an array as one argument and a constant as another, i.e., if the core dimensions can be broadcast if needed, just like they are in `(a==b).all(axis)`. This is currently not possible: with its signature of `(n),(n)->()`, the two arrays have to have the same trailing size. My PR provides the ability to indicate in the signature that a core dimension can be broadcast, by using a suffix of "|1". Thus, the signature of `all_equal` would become: ``` (n|1),(n|1)->() ``` Comments most welcome (yes, even on the notation - though I think it is fairly self-explanatory)! Marten p.s. There are some similarities to the new "flexible" dimensions implemented for `matmul` [1], but also differences. In particular, for a signature of `(n?),(n?)->()`, one could also pass in an array of trailing size n and a constant, but it would not be possible to pass in an array with trailing size 1: the dimensions with the same name have to be either present and the same or absent. In contrast, for broadcasting, dimensions with the same name can have trailing size n, size 1, or be absent (in which case they count as 1). For broadcasting, any output dimensions with the same name are never affected, while for flexible dimensions those are removed. [1] https://github.com/numpy/numpy/pull/11175 [2] https://github.com/numpy/numpy/pull/5015 [3] https://github.com/numpy/numpy/pull/11132 [4] https://github.com/numpy/numpy/pull/11179 [5] https://github.com/numpy/numpy/pull/8528
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Wed, May 30, 2018 at 11:15 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I read this as "dimensions may have size n or 1", which would exclude the possibility of scalars. For all_equal, I think you could also use a signature like "(m?),(n?)->()", with a short-cut to automatically return False if m != n.
![](https://secure.gravatar.com/avatar/5e44e61af952612c3c45385b62806c67.jpg?s=120&d=mm&r=g)
"short-cut to automatically return False if m != n", that seems like a silent bug AFAICT there are 3 possibilities: 1) current behavior 2) a scalar or size 1 array may be substituted, ie a constant 3) a scalar or array with shape[-1] == 1 may be substituted and broadcasted I am fond of using "n^" to signify #3 since I think of broadcasting as increasing the size of the array. Although a stretch, "n#" might work for #2, as it reminds me of #define'ing constants in C. To generalize a bit, most elementwise operations have obvious broadcasting cases and reduce operations have a core dimension. Fusing any two, ie sumproduct, would result in a gufunc which would benefit from this ability. On Wed, May 30, 2018 at 7:22 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Stephan, Matt, My `n|1` was indeed meant to be read as `n or 1`, but with the (implicit) understanding that any array can have as many ones pre-pended as needed. The signature `(n?),(n?)->()` is now set aside for flexible dimensions: this would allow the constant, but not the trailing shape of 1 (at least as we currently have implemented it). I do think that is more consistent with the visual suggestion, that the think may be absent. It also has implications for output: `(m?,n),(n,p?)->(m?,p?)` is meant to indicate that if a dimension is absent, it should be absent in the output as well. In contrast, for broadcasting, I'd envisage `(n|1),(n|1)->(n)` to indicate that the output dimension will always be present and be of length n. I'm not sure I'm sold on `n^` - I don't think it gives an immediate hint of what it would do... All the best, Marten
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Wed, May 30, 2018 at 5:01 PM Matthew Harrigan <harrigan.matthew@gmail.com> wrote:
"short-cut to automatically return False if m != n", that seems like a silent bug
I guess it depends on the use-cases. This is how np.array_equal() works: https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_equal.html We could even imagine incorporating this hypothetical "equality along some axes with broadcasting" functionality into axis/axes arguments for array_equal() if we choose this behavior.
![](https://secure.gravatar.com/avatar/5e44e61af952612c3c45385b62806c67.jpg?s=120&d=mm&r=g)
Stephan, good point about use cases. I think its still an odd fit. For example I think np.array_equal(np.zeros((3,3)), np.zeros((2,2))) or np.array_equal([1], ['foo']) would be difficult or impossible to replicate with a potential all_equal gufunc On Thu, May 31, 2018 at 2:00 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Nathaniel, On Matt's prompting, I added release notes to the frozen/flexible PR [1]; see text attached below. Having done that, I felt the examples actually justified the frozen dimensions quite well. Given that you're the who expressed most doubts about them, could you have a look? Ideally, I'd avoid having to write a NEP for this, and the examples do seem to make it quite obvious that this change to the signature is the way to go, as its meaning is dead obvious. And the implementation is super-straightforward... For the broadcasted core dimensions, I do agree the case is less strong and the meaning perhaps less obvious (implementation is relatively simple), and I think a short NEP may be called for (unless others on the list have super-convincing use cases...). I will add here, though, that even if we implement `all_equal` as a method on `equal`, it would still be useful to have a signature that can actually describe it. -- Marten [1] https://github.com/numpy/numpy/pull/11175/files Generalized ufunc signatures now allow fixed-size dimensions ------------------------------------------------------------ By using a numerical value in the signature of a generalized ufunc, one can indicate that the given function requires input or output to have dimensions with the given size. E.g., the signature of a function that converts a polar angle to a two-dimensional cartesian unit vector would be ``()->(2)``; that for one that converts two spherical angles to a three-dimensional unit vector would be ``(),()->(3)``; and that for the cross product of two three-dimensional vectors would be ``(3),(3)->(3)``. Note that to the elementary function these dimensions are not treated any differently from variable ones indicated with a letter; the loop still is passed the corresponding size, but it can now count on that being equal to the fixed size given in the signature. Generalized ufunc signatures now allow flexible dimensions ---------------------------------------------------------- Some functions, in particular numpy's implementation of ``@`` as ``matmul``, are very similar to generalized ufuncs in that they operate over core dimensions, but one could not present them as such because they were able to deal with inputs in which a dimension is missing. To support this, it is now allowed to postfix a dimension name with a question mark to indicate that that dimension does not necessarily have to be present. With this addition, the signature for ``matmul`` can be expressed as ``(m?,n),(n,p?)->(m?,p?)``. This indicates that if, e.g., the second operand has only one dimension, for the purposes of the elementary function it will be treated as if that input has core shape ``(n, 1)``, and the output has the corresponding core shape of ``(m, 1)``. The actual output array, however, has flexible dimension removed, i.e., it will have shape ``(..., n)``. Similarly, if both arguments have only a single dimension, the inputs will be presented as having shapes ``(1, n)`` and ``(n, 1)`` to the elementary function, and the output as ``(1, 1)``, while the actual output array returned will have shape ``()``. In this way, the signature thus allows one to use a single elementary function for four related but different signatures, ``(m,n),(n,p)->(m,p)``, ``(n),(n,p)->(p)``, ``(m,n),(n)->(m)`` and ``(n),(n)->()``.
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Fri, Jun 1, 2018 at 2:42 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I do think it would be valuable to have a brief NEP on this, especially on the solution for matmul. NEPs don't have to be long, and don't need to go into the full detail of implementations. But they are a nice place to summarize design discussions. In fact, I would say the text you have below is nearly enough for one or two NEPs. The parts that are missing would be valuable to add anyways: - A brief discussion (a sentence or two) of potential broader use-cases for optional dimensions (ufuncs that act on row/column vectors and matrices). - A brief discussion of rejected alternatives (only a few sentences for each alternative).
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
For the flexible dimensions, that would be up to Nathaniel -- it's his idea ;-) And happily that means that I don't have to spend time looking up how this NEP business actually works, but can just copy & paste... -- Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
OK, I spent my Sunday morning writing a NEP. I hope this can lead to some closure... See https://github.com/numpy/numpy/pull/11297 -- Marten
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Rendered here: https://github.com/mhvk/numpy/blob/nep-gufunc-signature-enhancement/doc/neps... Eric On Sun, 10 Jun 2018 at 09:37 Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Thanks for the writeup Marten, Nathaniel: Output shape feels very similar to output dtype to me, so maybe the general way to handle this would be to make the first callback take the input shapes+dtypes and return the desired output shapes+dtypes? This hits on an interesting alternative to frozen dimensions - np.cross could just become a regular ufunc with signature np.dtype((float64, 3)), np.dtype((float64, 3)) → np.dtype((float64, 3)) Furthermore, the expansion quickly becomes cumbersome. For instance, for the all_equal signature of (n|1),(n|1)->() … I think this is only a good argument when used in conjunction with the broadcasting syntax. I don’t think it’s a reason for matmul not to have multiple signatures. Having multiple signatures is an disincentive to introduced too many overloads of the same function, which seems like a good thing to me Summarizing my overall opinions: - I’m +0.5 on frozen dimensions. The use-cases seem reasonable, and it seems like an easy-ish way to get them. Allowing ufuncs to natively support subarray types might be a tidier solution, but that could come down the road - I’m -1 on optional dimensions: they seem to legitimize creating many overloads of gufuncs. I’m already not a fan of how matmul has special cases for lower dimensions that don’t generalize well. To me, the best way to handle matmul would be to use the proposed __array_function__ to handle the shape-based special-case dispatching, either by: - Inserting dimensions, and calling the true gufunc np.linalg.matmul_2d (which is a function I’d like direct access to anyway). - Dispatching to one of four ufuncs - Broadcasting dimensions: - I know you’re not suggesting this but: enabling broadcasting unconditionally for all gufuncs would be a bad idea, masking linalg bugs. (although einsum does support broadcasting…) - Does it really need a per-dimension flag, rather than a global one? Can you give a case where that’s useful? - If we’d already made all_equal a gufunc, I’d be +1 on adding broadcasting support to it - I’m -0.5 on the all_equal path in the first place. I think we either should have a more generic approach to combined ufuncs, or just declare them numbas job. - Can you come up with a broadcasting use-case that isn’t just chaining a reduction with a broadcasting ufunc? Eric On Sun, 10 Jun 2018 at 16:02 Eric Wieser <wieser.eric+numpy@gmail.com> wrote: Rendered here:
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
In Sun, Jun 10, 2018 at 4:31 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Thanks for the writeup Marten,
Indeed, thank you Marten!
That said, I still think frozen dimension are a better proposal than either of these.
My concern with either inserting dimensions or dispatching to one of four ufuncs is that some objects (e.g., xarray.DataArray) define matrix multiplication, but in an incompatible way with NumPy (e.g., xarray sums over axes with the same name, instead of last / second-to-last axes). NumPy really ought to provide a way overload the either operation, without either inserting/removing dummy dimensions or inspecting input shapes to dispatch to other gufuncs. That said, if you don't want to make np.matmul a gufunc, then I would much rather use Python's standard overloading rules with __matmul__/__rmatmul__ than use __array_function__, for two reasons: 1. You *already* need to use __matmul__/__rmatmul__ if you want to support matrix multiplication with @ on your class, so __array_function__ would be additional and redundant. __array_function__ is really intended as a fall-back, for cases where there is no other alternative. 2. With the current __array_function__ proposal, this would imply that calling other unimplemented NumPy functions on your object would raise TypeError rather than doing coercion. This sort of additional coupled behavior is probably not what an implementor of operator.matmul/@ is looking for. In summary, I would either support: 1. (This proposal) Adding additional optional dimensions to gufuncs for np.matmul/operator.matmul, or 2. Making operator.matmul a special case for mathematical operators that always checks overloads with __matmul__/__rmatmul__ even if __array_ufunc__ is defined. Either way, matrix-multiplication becomes somewhat of a special case. It's just a matter of whether it's a special case for gufuncs (using optional dimensions) or a special case for arithmetic overloads in NumPy (not using __array_ufunc__). Given that I think optional dimensions have other conceivable uses in gufuncs (for row/column vectors), I think that's the better option. I would not support either expand dimensions or dispatch to multiple gufuncs in NumPy's implementation of operator.matmul (i.e., ndarray.__matmul__). We could potentially only do this for numpy.matmul rather than operator.matmul/@, but that opens the door to potential inconsistency between the NumPy version of an operator and Python's version of an operator, which is something we tried very hard to avoid with __arary_ufunc__.
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
I don’t understand your alternative here. If we overload np.matmul using *array_function*, then it would not use *ether* of these options for writing the operation in terms of other gufuncs. It would simply look for an *array_function* attribute, and call that method instead. Let me explain that suggestion a little more clearly. 1. There’d be a linalg.matmul2d that performs the real matrix case, which would be easy to make as a ufunc right now. 2. __matmul__ and __rmatmul__ would just call np.matmul, as they currently do (for consistency between np.matmul and operator.matmul, needed in python pre-@-operator) 3. np.matmul would be implemented as: @do_array_function_overridesdef matmul(a, b): if a.ndim != 1 and b.ndim != 1: return matmul2d(a, b) elif a.ndim != 1: return matmul2d(a, b[:,None])[...,0] elif b.ndim != 1: return matmul2d(a[None,:], b) else: # this one probably deserves its own ufunf return matmul2d(a[None,:], b[:,None])[0,0] 4. Quantity can just override __array_ufunc__ as with any other ufunc 5. DataArray, knowing the above doesn’t work, would implement something like @matmul.register_array_function(DataArray)def __array_function__(a, b): if a.ndim != 1 and b.ndim != 1: return matmul2d(a, b) else: # either: # - add/remove dummy dimensions in a dataarray-specific way # - downcast to ndarray and do the dimension juggling there Advantages of this approach: - Neither the ufunc machinery, nor __array_ufunc__, nor the inner loop, need to know about optional dimensions. - We get a matmul2d ufunc, that all subclasses support out of the box if they support matmul Eric
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Mon, Jun 11, 2018 at 11:59 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
OK, this sounds pretty reasonable to me -- assuming we manage to figure out the __array_function__ proposal! There's one additional ingredient we would need to make this work well: some way to guarantee that "ndim" and indexing operations are available without casting to a base numpy array. For now, np.asanyarray() would probably suffice, but that isn't quite right (e.g., this would fail for np.matrix). In the long term, I think we need a new coercion protocol for "duck" arrays. Nathaniel Smith and I started writing a NEP on this, but it isn't quite ready yet.
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi All, Matti asked me to make a PR accepting my own NEP - https://github.com/numpy/numpy/pull/11429 Any objections? As noted in my earlier summary of the discussion, in principle we can choose to accept only parts, although I think it became clear that the most contentious is also the one arguably most needed, the flexible dimensions for matmul. Moving forward has the advantage that in 1.16 we will actually be able to deal with matmul. All the best, Marten On Fri, Jun 15, 2018 at 2:17 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi All, In case it was missed because people have tuned out of the thread: Matti and I proposed last Tuesday to accept NEP 20 (on coming Tuesday, as per NEP 0), which introduces notation for generalized ufuncs allowing fixed, flexible and broadcastable core dimensions. For one thing, this will allow Matti to finish his work on making matmul a gufunc. See http://www.numpy.org/neps/nep-0020-gufunc-signature-enhancement.html All the best, Marten ---------- Forwarded message ---------- From: Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> Date: Tue, Jun 26, 2018 at 2:25 PM Subject: Re: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs To: Discussion of Numerical Python <numpy-discussion@python.org> Hi All, Matti asked me to make a PR accepting my own NEP - https://github.com/numpy/numpy/pull/11429 Any objections? As noted in my earlier summary of the discussion, in principle we can choose to accept only parts, although I think it became clear that the most contentious is also the one arguably most needed, the flexible dimensions for matmul. Moving forward has the advantage that in 1.16 we will actually be able to deal with matmul. All the best, Marten
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jun 30, 2018 at 6:51 AM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
So I still have some of the same concerns as before... For the possibly missing dimensions: matmul is really important, and making it a gufunc solves the problem of making it overridable by duck arrays (via __array_ufunc__). Also, it will help later when we rework dtypes: new dtypes will be able to implement matmul by the normal ufunc loop registration mechanism, which is much nicer than the current system where every dtype has a special-case method just for handling matmul. The ? proposal isn't the most elegant idea ever, but we've been tossing around ideas for solving these problems for a while, and so far this seems to be the least-bad one, so... sure, let's do it. For the fixed-size dimensions: this makes me nervous. It's aimed at a real use case, which is a major point in it's favor. But a few things make me wary. For input dimensions, it's sugar – the gufunc loop can already raise an error if it doesn't like the size it gets. For output dimensions, it does solve a real problem. But... only part of it. It's awkward that right now you only have a few limited ways to choose output dimensions, but this just extends the list of special cases, rather than solving the underlying problem. For example, 'np.linalg.qr' needs a much more generic mechanism to choose output shape, and parametrized dtypes will need a much more generic mechanism to choose output dtype, so we're definitely going to end up with some phase where arbitrary code gets to describe the output array. Are we going to look back on fixed-size dimensions as a quirky, redundant thing? Also, as currently proposed, it seems to rule out the possibility of using name-based axis specification in the future, right? (See https://github.com/numpy/numpy/pull/8819#issuecomment-366329325) Are we sure we want to do that? If everyone else is comfortable with all these things then I won't block it though. For broadcasting: I'm sorry, but I think I'm -1 on this. I feel like it falls into a classic anti-pattern in numpy, where someone sees a cool thing they could do and then goes looking for problems to justify it. (A red flag for me is that "it's easy to implement" keeps being mentioned as justification for doing it.) The all_equal and weighted_mean examples both feel pretty artificial -- traditionally we've always implemented these kinds of functions as regular functions that use (g)ufuncs internally, and it's worked fine (cf. np.allclose, ndarray.mean). In fact in some sense the whole point of numpy is to help people implement functions like this, without having to write their own gufuncs. Is there some reason these need to be gufuncs? And if there is, are these the only things that need to be gufuncs, or is there a broader class we're missing? The design just doesn't feel well-justified to me. And in the past, when we've implemented things like this, where the use cases are thin but hey why not it's easy to do, it's ended up causing two problems: first people start trying to force it into cases where it doesn't quite work, which makes everyone unhappy... and then when we eventually do try to solve the problem properly, we end up having to do elaborate workarounds to keep the old not-quite-working use cases from breaking. I'm pretty sure we're going to end up rewriting most of the ufunc code over the next few years as we ramp up duck array and user dtype support, and it's already going to be very difficult, both to design in the first place and then to implement while carefully keeping shims to keep all the old stuff working. Adding features has a very real cost, because it adds extra constraints that all this future work will have to work around. I don't think this meets the bar. By the way, I also think we're getting well past the point where we should be switching from a string-based DSL to a more structured representation. (This is another trap that numpy tends to fall into... the dtype "language" is also a major offender.) This isn't really a commentary on any part of this in particular, but just something that I've been noticing and wanted to mention :-). -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Nathaniel, Thanks for the detailed thoughts. On Tue, Jul 3, 2018 at 4:27 AM, Nathaniel Smith <njs@pobox.com> wrote:
Indeed, I only became recently aware of the ->dot member of the dtype struct. Pretty ugly!
Yes, I think this ugliness is the price we pay for matmul not just meaning matrix multiplication but also no-broadcast vector-matrix, matrix-vector, and vector-vector. (Of course, my after-the-fact annoyance with that design does make me more sensitive to the argument that one should not try to shoehorn gufuncs into cases they are not meant for.)
Can it? I would never have thought that the inner loop would need to do any checking; it is certainly not obvious from the code or from the documentation. Does the iterator itself check for errors every iteration? If so, that might be one place for a quick speed-up for ufunc.at...
I think this is a much rarer case, which will indeed need some type of hook no matter what. (I have been thinking about solving that by moving the call to the type resolver earlier. That one gets passed `op`, so it can create an output array if it doesn't exist; it doesn't need any of the sizes.)
For this particular case, I find the signature so much clearer that that in itself is a reason to do it (readability counts and all that).
I think it only excludes having the choice of keying the dict with the default axis number *or* with its dimension name, which I think would in fact be a good idea: if one uses a dict to describe axes entries, the keys should be the names of the axes, where name can be one of the fixed numbers. (Actually, strictly, since the default axis number is always negative, one can even have both. But that would be awful.) Should add that I don't think the scheme will work all that well anyway - there are quite a few cases where one duplicates a name (say, a square matrix (n,n)), for which keying by "n" would be less than useful.
Those are all fair points. For all_equal one cannot really write a separate function easily given the absence of short-circuiting. But possibly that just argues for the need to short-circuit...
Given my sentiments about the multiple meanings of `@`, I'm sensitive to this argument. But as Matti pointed out, we do not have the accept the whole NEP. Indeed, the broadcasting is in a separate PR.
I think by now it is clear that moving incrementally is the way to go; the ufunc code is in fact being rewritten, if slowly.
Yet, dtype('(3,4)f8') is really clear, unlike all the other forms... It is similar to the string formatting mini-language. Anyway, that is a bit off topic. Overall, would one way to move forward be to merge the first PR (flexible and frozen) and defer the broadcastable dimensions? All the best, Marten p.s. I'm amused that the broadcastable dimensions were in fact the default originally. At some point, I should try to find out why that default was changed.
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Tue, Jul 3, 2018 at 7:13 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Overall, would one way to move forward be to merge the first PR (flexible and frozen) and defer the broadcastable dimensions?
This would have my support. I have similar misgivings about broadcastable dimensions to those raised by Nathaniel. In particular, I wonder if there is some way to make use of this functionality internally in NumPy for functions like all_equal() without exposing it as part of the external gufunc API.
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
On Tue, Jul 3, 2018 at 12:11 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
OK, so let me explicitly ask whether there are any objections to going forward with flexible and frozen dimensions, but deferring on broadcastable ones until more compelling use cases have been identified? Thanks, Marten p.s. I adjusted the "acceptance PR" to reflect this: https://github.com/numpy/numpy/pull/11429
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
As you note further down, the present proposal of just using numbers has the advantage of being clear and easy. Another (small?) advantage is that I can use `axis` to tell where my three coordinates are, rather than be stuck with having them as the last dimension. Indeed, in my trials for wrapping the Standards Of Fundamental Astronomy routines, I started with just making every 3-vector and 3x3-matrix structured arrays with the relevant single sub-array entry. That worked, but I ended up disliking the casting to and fro.
But implementation for matmul is actually considerably trickier, since the internal loop now has to check the number of distinct dimensions.
otherwise agree with Stephan as optional dimensions being the least-bad solution. Aside: do agree we should think about how to expose the `linalg` gufuncs.
dimensions... Also, it has the benefit of being clear what the function can handle by inspection of the signature, i.e., it self-documents better (one of my main arguments in favour of frozen dimensions...).
be to auto-create an inner loop that calls all the chained ufuncs loops in turn). Not sure that short-circuiting will be all that easy. I actually quite like the all_equal ufunc, but it is in part because I remember discovering how painfully slow (a==b).all() was (and still have a place where I would use it if it existed). And it does fit in the (admittedly vague) plans to try to make `.reduce` a gufunc.
such functions... Absent a mechanism to chain ufuncs, more complicated gufuncs are currently the easiest way to get fast more complicated algebra. But perhaps a putative weighted_mean(y, sigma) -> mean, sigma_mean is a decent example? Its signature would be (n),(n)->(),() but then you're forced to give individual sigmas for each point. With (n|1),(n|1)->(),() you are no longer forced to do that (though the case of all y being the same is less than useful here... I did at some point have an implementation that worked by core dimension of each argument, but ended up feeling it was not worth the extra complication) -- Marten
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Frozen dimensions: I started with just making every 3-vector and 3x3-matrix structured arrays with the relevant single sub-array entry I was actually suggesting omitting the structured dtype (ie, field names) altogether, and just using the subarray dtypes (which exist alone, but not in arrays). Another (small?) advantage is that I can use `axis This is a fair argument against my proposal - at any rate, I think we’d need a better story for subarray dtypes before trying to add support to them for ufuncs ------------------------------ Broadcasting dimensions But perhaps a putative weighted_mean … is a decent example That’s fairly convincing as a non-chained ufunc case. Can you add an example like that to the NEP? Also, it has the benefit of being clear what the function can handle by inspection of the signature Is broadcasting (n),(n)->(),() less clear that (n|1),(n|1)->(),()? Can you come up with an example where only some dimensions make sense to broadcast? ------------------------------ Eric On Mon, 11 Jun 2018 at 08:04 Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
On Tue, Jun 12, 2018 at 2:35 AM, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Yes, I've been wondering about the point of the sub-arrays... They seem interesting but in arrays just disappear. Their possible use would be to change the shape as seen by the outside world (as happens if one does define that sub-array as a 1-part structured array). Anyway, for another discussion!
Done.
Not a super-convincing one, though I guess one could think of a similar function for 3-vectors (which somehow must care about those being three-dimensional, because, say, it calculates the average direction of the cross product in spherical angles...), then, in the signature `(n,3),(n,3)->(),(),(),()` one would like to indicate that the `n` could be broadcast, but the `3` could not. As I now write in the NEP, part of the reason of doing it by distinct dimension is that I already need a flag for flexible, so it is easy to add one for broadcastable; similarly, in the actual code, there is quite a bit of shared stuff. -- Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi All, The discussion on the gufunc signature enhancements seems to have stalled a bit, but while it was going I've tried to update the NEP correspondingly. The NEP is now merged, so can viewed more easily, at http://www.numpy.org/neps/nep-0020-gufunc-signature-enhancement.html My own quite possibly biased summary of the discussion so far is that: 1) Frozen dimensions are generally seen as a good idea; other implementations may be possible, but are not as clear. 2) Flexible dimensions have little use beyond matmul; the main discussion is whether there is a better way. In my opinion, the main benefit of the current proposal is that it allows operator overrides to all work the same way (via __array_ufunc__), independent of any assumptions about the object that does the override (such as that it has a shape). 3) Broadcastable dimensions had less support, but mostly for lack of examples; there now is one beyond all_equal, for which a gufunc is more clearly the proper route: a weighted average (which has obvious extensions). A general benefit of course is that there is actual code for all three; it would certainly be nice if we could fully support `matmul` and `@` in 1.16. So, the question would seem whether the NEP should be accepted or rejected (with part acceptance of course being possible, though I note that flexible and broadcastable share a lot of implementation, so in my opinion it is somewhat pointless to do just one of them). All the best, Marten
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Wed, May 30, 2018 at 11:14 AM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
I'm currently -0.5 on both fixed dimensions and this broadcasting dimension idea. My reasoning is: - The use cases seem fairly esoteric. For fixed dimensions, I guess the motivating example is cross-product (are there any others?). But would it be so bad for a cross-product gufunc to raise an error if it receives the wrong number of dimensions? For this broadcasting case... well, obviously we've survived this long without all_equal :-). And there's something funny about all_equal, since it's really smushing together two conceptually separate gufuncs for efficiency. Should we also have all_less_than, sum_square, ...? If this is a big problem, then wouldn't it be better to solve it in a general way, like dask or Numba or numexpr do? To be clear, I'm not saying these features are necessarily *bad* ideas, in isolation -- just that the benefits aren't very convincing, and there are trade-offs, like: - When it comes to the core ufunc machinery, we have a limited complexity budget. I'm nervous that if we add too many bells and whistles, we'll end up writing ourselves into a corner where we have trouble maintaining it, where it becomes difficult to predict how different features interact, it becomes increasingly difficult for third-parties to handle all the different features in their __array_ufunc__ methods... - And, we have a lot of other demands on the core ufunc machinery, that might be better places to spend our limited complexity budget. For example, can we come up with an extension to make np.sort a gufunc? That seems like a much higher priority than figuring out how to make all_equal a gufunc. What about refactoring the ufunc machinery to support user-defined dtypes? That'll need some serious work, and again, it's probably higher priority than supporting cross-product or all_equal directly (or at least it seems that way to me). Maybe there are more compelling use cases that I'm missing, but as it is, I feel like trying to add too many features to the current ufunc machinery is pretty risky for future maintainability, and we shouldn't do it without really solid use cases. -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Nathaniel, I think the case for frozen dimensions is much more solid that just `cross1d` - there are many operations that work on size-3 vectors. Indeed, as I noted in the PR, I have just been wrapping a Standards-of-Astronomy library in gufuncs, and many of its functions require size-3 vectors or 3x3 matrices [1]. Of course, I can put checks on the sizes, and I've now done that in a custom type resolver (which I needed anyway since, as you say, user dtypes is currently not easy), but there is a real problem for functions that take scalars and produce vectors: with a signature like `(),()->(n)`, I am forced to pass in an output with size 3, which is very inconvenient (especially if I then also want to override with `__array_ufunc__` - now my Quantity implementation also has to start changing an output already put in. So, having frozen dimensions is definitely helpful for developers of new gufuncs. Furthermore, with frozen dimensions, the signature is not just immediately clear, `(),()->(3)` for the example above, it is also better in telling users about what a function does. Indeed, I think this addition has much more justification than the `?` which is much more complex than the fixed size, yet neither particularly clear nor useful beyond the single purpose of matmul. (It is just that that single purpose has fairly high weight...) As for broadcasting, well, with the flexible dimensions defined, the *additional* complexity is very small. I have no ready example other than all_equal, though will say that I currently have code that does `if a[0] == x && a[1] == x && a[2] ==x && np.all(a[3:] == x):` just because the short-circuiting is well worth the time (it is unlikely in this context that all a equals x). So, there is at least one example of an actual need for this function. All the best, Marten [1] https://github.com/astropy/astropy/pull/7502
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Thu, May 31, 2018 at 4:21 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I agree that the use-cases for frozen dimensions are well motivated. It's not as common as writing code that supports arbitrary dimensions, but given that the real world is three dimensional it comes up with some regularity. Certainly for these use-cases it would add significant values (not requiring pre-allocation of output arrays). Furthermore, with frozen dimensions, the signature is not just
immediately clear, `(),()->(3)` for the example above, it is also better in telling users about what a function does.
Yes, frozen dimensions really do feel like a natural fit. There is no ambiguity about what an integer means in a gufunc signature, so the complexity of the gufunc model (for users and __array_ufunc__ implementors) would remain roughly fixed. In contrast, broadcasting would certainly increase the complexity of the model, as evidenced by the new syntax we would need. This may or may not be justified. Currently I am at -0.5 along with Nathaniel here.
Agreed, though at least in principle there is the slightly broader use of case of handling arguments that are either matrices or column/row vectors.
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Thu, May 31, 2018 at 4:20 AM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
Ah, this does sound like I'm missing something. I suspect this is a situation where we have two problems: - For some people the use cases are everyday and obvious; for others they're things we've never heard of (what's a "standard of astronomy"?) - The discussion is scattered around mailing list posts, old comments on random github issues, etc. This makes it hard for everyone to be on the same page. But this is exactly the situation where NEPs are useful. Maybe you could write up a short NEP for frozen dimensions? It doesn't need to be fancy or take long, but I think it'd be useful to have a single piece of text we can all look at that describes the use cases and how frozen dimensions help. BTW, regarding output shape: as you hint, there's a similar problem with parametrized dtypes in general. Consider defining a loop for np.add that lets it concatenate strings. If the inputs are S4 and S5, then the output should be S9 – but how does the ufunc machinery know that? This suggests that when we do the big refactor to ufuncs to support user-defined and parametrized dtypes in general, one of the things we'll need is a way for an individual loop to select the output dtype. One natural way to do this would be to have two callbacks per loop: one that receives the input dtypes, and returns the output dtypes, and then the other that's like the current loop callback that actually performs the operation. Output shape feels very similar to output dtype to me, so maybe the general way to handle this would be to make the first callback take the input shapes+dtypes and return the desired output shapes+dtypes? Maybe frozen dimensions are a good idea regardless, but just wanted to put that out there since it might be a more general solution.
Yeah, that's why I'm not 100% happy with '?' either (even though I proposed it in the first place :-)). But matmul is like, arguably the single most important operation in numpy, so it can justify a lot more... -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 05/31/2018 12:10 AM, Nathaniel Smith wrote:
I have often wished numpy had these short-circuiting gufuncs, for a very long time. I specifically remember my fruitless searches for how to do it back to 2007. While "on average" short-circuiting only gives a speedup of 2x, in many situations you can arrange your algorithm so short circuiting will happen early, eg usually in the first 10 elements of a 10^6 element array, giving enormous speedups. Also, I do not imagine these as free-floating ufuncs, I think we can arrange them in a logical way in a gufunc ecosystem. There would be some "core ufuncs", with "associated gufuncs" accessible as attributes. For instance, any_less_than will be accessible as less.any binary "comparison" ufuncs would have attributes less.any less.all less.first # returns first matching index less.count # counts matches without intermediate bool array This adds on to the existing attributes, for instance ufuncs already have: add.reduce add.accumulate add.reduceat add.outer add.at It is unfortunate that all ufuncs currently have these attributes even if they are unimplemented/inappropriate (eg, np.sin.reduce), I would like to remove the inappropriate ones, so each core ufunc will only have the appropriate attribute "associated gufuncs". Incidentally, once we make reduce/accumuate/... into "associated gufuncs", I propose completely removing the "method" argument of __array_ufunc__, since it is no longer needed and adds a lot of complexity which implementors of an __array_ufunc__ are forced to account for. Cheers, Allan
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
<snip>
So then, why is it a gufunc and not an attribute using a ufunc with binary output? I have asked this before, and even got arguments as to why it fits gufuncs better, but frankly I still do not really understand. If it is an associated gufunc, why gufunc at all? We need any() and all() here, so that is not that many methods, right? And when it comes to buffering you have much more flexibility. Say I have the operation: (float_arr > int_arr).all(axis=(1, 2)) With int_arr being shaped (2, 1000, 1000) (i.e. large along the interesting axes). A normal gufunc IIRC will get the whole inner dimension as a float buffer. In other words, you gain practically nothing, because the whole int_arr will be cast to float anyway. If, however, you actually implement np.greater_than.all(float_arr, int_arr, axis=(1, 2)) as a separate ufunc method, you would have the freedom to work in the typical cache friendly buffersize chunk size for each of the outer dimensions one at a time. A gufunc would require to say: please do not buffer for me, or implement all possible type combinations to do this. (of course there are memory layout subtleties, since you would have to optimize always for the "fast exit" case, potentially making the worst case scenario much worse -- unless you do seriously fancy stuff anyway). A more general question is actually whether we should rather focus on solving the same problem more generally. For example if `numexpr` would implement all/any reductions, it may be able to pretty simply get the identical tradeoffs with even more flexibility! (I have to admit, it may get tricky with multiple reduction dimensions, etc.) - Sebastian
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 05/31/2018 09:53 AM, Sebastian Berg wrote:
Hmm, I hadn't known/considered the limitations of gufunc buffer sizes. I was just thinking of them as a standardized interface which handles the where/out/broadcasting for you. I'll have to read about it. One thing I don't like about the ufunc-method strategy is that it esily pollutes all the ufuncs namespaces and their implementations, so many ufuncs have to account for a new "all" method even if innapropriate, for example. Cheers, Allan
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
<snip>
For Quantity at least I found it somewhat helpful to have the method separate from the ufunc, as how one deals with it still has similarities. Though it would probably have been similarly easy if `__array_ufunc__` had been passed `np.add.reduce` directly. Aside: am not sure we can so easily remove `method` any more, but I guess we can at least start getting to a state where it always is `__call__`. Anyway, this does argue we should be rather careful with signatures.. In particular, for `__array_function__` we really need to think carefully about `types` - an `__array_function__` specific dict at the end may be more useful. All the best, Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Sebastian, This is getting a bit far off-topic (which is whether it is a good idea to allow the ability to set frozen dimensions and broadcasting), but on `all_equal`, I definitely see the point that a method might be better, but that needs work: to expand the normal ufunc mechanism to allow the inner loop to carry state (for which there is an issue [1]) and for it to tell the iterator to stop feeding it buffers. It is not quite clear to me that this is better/easier than expanding gufuncs to be able to deal with buffers... All the best, Marten [1] https://github.com/numpy/numpy/issues/8773
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
While "on average" short-circuiting only gives a speedup of 2x, in many situations you can arrange your algorithm so short circuiting will happen early, eg usually in the first 10 elements of a 10^6 element array, giving enormous speedups. Also, I do not imagine these as free-floating ufuncs, I think we can arrange them in a logical way in a gufunc ecosystem. There would be some "core ufuncs", with "associated gufuncs" accessible as attributes. For instance, any_less_than will be accessible as less.any binary "comparison" ufuncs would have attributes less.any less.all less.first # returns first matching index less.count # counts matches without intermediate bool array This adds on to the existing attributes, for instance ufuncs already have: add.reduce add.accumulate add.reduceat add.outer add.at It is unfortunate that all ufuncs currently have these attributes even if they are unimplemented/inappropriate (eg, np.sin.reduce), I would like to remove the inappropriate ones, so each core ufunc will only have the appropriate attribute "associated gufuncs". I’m definitely in favour of all this. It’d be great to have this, and it’d be an excellent ecosystem. I’ll add that composing ufuncs is something I’ve wanted, and that has come up from time to time. Incidentally, once we make reduce/accumuate/... into "associated gufuncs", I propose completely removing the "method" argument of __array_ufunc__, since it is no longer needed and adds a lot of complexity which implementors of an __array_ufunc__ are forced to account for. While removing ‘method’ is okay in my book, there should at least be a way to detect if something is e.g., a reduction, or an element-wise ufunc (this one is obvious, all shapes involved will be ()). We, for example, use this in pydata/sparse. As you can imagine, for sparse arrays, element-wise operations behave a certain way and there turns out to be a general way to do reductions that have certain properties as well. See my paper’s draft[1] for details. I don’t mind the __array_ufunc__ api changing, but I’d like it if there was a way to still access the information that was previously available. [1] https://github.com/scipy-conference/scipy_proceedings/pull/388 Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 05/31/2018 12:10 AM, Nathaniel Smith wrote:
Re: implenetation complexity, I just want to bring up multiple-dispatch signatures again, where the new signature syntax would just be to join some signatures together with "|", and try them in order until one works. I'm not convinced it's better myself, I just wanted to make sure we are aware of it. The translation from the current proposed syntax would be: Current Syntax Multiple-dispatch syntax (n|1),(n|1)->() <===> (n),(n)->() | (n),()->() | (),(n)->() (m?,n),(n,p?)->(m?,p?) <===> (m,n),(n,p)->(m,p) | (n),(n,p)->(p) | (m,n),(n)->(m) | (n),(n)->() Conceivably, multiple-dispatch could reduce code complexity because we don't need all the special flags like UFUNC_CORE_DIM_CAN_BROADCAST, and instead of handling special syntax for ? and | and any future syntax separately, we just need a split("|") and then loop with the old signature handling code. On the other hand the m-d signatures are much less concise and the intention is perhaps harder to read. Yet they more explicitly state which combinations are allowed, while with '?' syntax you might have to puzzle it out. Cheers, Allan
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Allan, Seeing it written out like that, I quite like the multiple dispatch signature: perhaps verbose, but clear. It does mean a different way of changing the ufunc structure, but I actually think it has the advantage of being possible without extending the structure (though that might still be needed for frozen dimensions...): currently, the relevant parts of _tagPyUFuncObject are: ``` /* 0 for scalar ufunc; 1 for generalized ufunc */ int core_enabled; /* number of distinct dimension names in signature */ int core_num_dim_ix; /* numbers of core dimensions of each argument */ int *core_num_dims; /* dimension indices in a flatted form */ int *core_dim_ixs; /* positions of 1st core dimensions of each argument in core_dim_ixs */ int *core_offsets; ``` I think this could be changed without making any serious change if we slightly repurposed `core_enabled` as meaning `number of signatures`. Then, the pointers above could become ``` int core_xxx[num_sigs][max_core_num_dim_ixs]; ``` Less easy is `core_num_dim_ix`, but that number is not really needed anyway (nor used much in the code) as it is implicitly encoded in the indices. It could quite logically become the `max_core_num_dim_ixs` above. All the best, Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
p.s. While my test case of `cube_equal` in the PR is perhaps not super-realistic, I don't know if one really wants to do multiple dispatch on something like "(o|1,n|1,m|1),(o|1,n|1,m|1)->()"... -- Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
p.p.s. Your multiple dispatch signature for broadcasted dimensions is actually not quite right: should be (n|1),(n|1)->() ===> (n),(n)->() | (n),(1)->() | (1),(n)->() | (n),() -> () | (),(n)->() This is becoming quite verbose... (and perhaps become somewhat slow). Though arguably one could always allow missing dimensions to be 1 for this case, so that one could drop the last two.
![](https://secure.gravatar.com/avatar/72f994ca072df3a3d2c3db8a137790fd.jpg?s=120&d=mm&r=g)
On 31/05/18 08:23, Allan Haldane wrote:
I am -1 on multiple signatures. We may revisit this in time, but for now I find the minimal intrusiveness of the current changes appealing, especially as it requires few to no changes whatsoever to the inner loop function. Multiple dispatch could easily break that model by allowing very different signatures to be aggregated into a single ufunc, leading to unhandled edge cases and strange segfaults. It also seems to me that looping over all signatures might slow down ufunc calling, leading to endless variations of strategies of optimizing signature ordering. Matti
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
I had actually started trying Allan's suggestion [1], and at least parsing is not difficult. But I will stop now, as I think your point about the inner loop really needing a fixed set of sizes and strides is deadly for the whole idea. (Just goes to show I should think before writing code!) As is, all the proposed changes do is fiddle with size 1 axes (well, and defining a fixed size rather than letting the operand do it), which of course doesn't matter for the inner loop. -- Marten [1] https://github.com/numpy/numpy/compare/master...mhvk:gufunc-multiple-dispatc...
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Wed, May 30, 2018 at 11:15 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I read this as "dimensions may have size n or 1", which would exclude the possibility of scalars. For all_equal, I think you could also use a signature like "(m?),(n?)->()", with a short-cut to automatically return False if m != n.
![](https://secure.gravatar.com/avatar/5e44e61af952612c3c45385b62806c67.jpg?s=120&d=mm&r=g)
"short-cut to automatically return False if m != n", that seems like a silent bug AFAICT there are 3 possibilities: 1) current behavior 2) a scalar or size 1 array may be substituted, ie a constant 3) a scalar or array with shape[-1] == 1 may be substituted and broadcasted I am fond of using "n^" to signify #3 since I think of broadcasting as increasing the size of the array. Although a stretch, "n#" might work for #2, as it reminds me of #define'ing constants in C. To generalize a bit, most elementwise operations have obvious broadcasting cases and reduce operations have a core dimension. Fusing any two, ie sumproduct, would result in a gufunc which would benefit from this ability. On Wed, May 30, 2018 at 7:22 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Stephan, Matt, My `n|1` was indeed meant to be read as `n or 1`, but with the (implicit) understanding that any array can have as many ones pre-pended as needed. The signature `(n?),(n?)->()` is now set aside for flexible dimensions: this would allow the constant, but not the trailing shape of 1 (at least as we currently have implemented it). I do think that is more consistent with the visual suggestion, that the think may be absent. It also has implications for output: `(m?,n),(n,p?)->(m?,p?)` is meant to indicate that if a dimension is absent, it should be absent in the output as well. In contrast, for broadcasting, I'd envisage `(n|1),(n|1)->(n)` to indicate that the output dimension will always be present and be of length n. I'm not sure I'm sold on `n^` - I don't think it gives an immediate hint of what it would do... All the best, Marten
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Wed, May 30, 2018 at 5:01 PM Matthew Harrigan <harrigan.matthew@gmail.com> wrote:
"short-cut to automatically return False if m != n", that seems like a silent bug
I guess it depends on the use-cases. This is how np.array_equal() works: https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_equal.html We could even imagine incorporating this hypothetical "equality along some axes with broadcasting" functionality into axis/axes arguments for array_equal() if we choose this behavior.
![](https://secure.gravatar.com/avatar/5e44e61af952612c3c45385b62806c67.jpg?s=120&d=mm&r=g)
Stephan, good point about use cases. I think its still an odd fit. For example I think np.array_equal(np.zeros((3,3)), np.zeros((2,2))) or np.array_equal([1], ['foo']) would be difficult or impossible to replicate with a potential all_equal gufunc On Thu, May 31, 2018 at 2:00 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Nathaniel, On Matt's prompting, I added release notes to the frozen/flexible PR [1]; see text attached below. Having done that, I felt the examples actually justified the frozen dimensions quite well. Given that you're the who expressed most doubts about them, could you have a look? Ideally, I'd avoid having to write a NEP for this, and the examples do seem to make it quite obvious that this change to the signature is the way to go, as its meaning is dead obvious. And the implementation is super-straightforward... For the broadcasted core dimensions, I do agree the case is less strong and the meaning perhaps less obvious (implementation is relatively simple), and I think a short NEP may be called for (unless others on the list have super-convincing use cases...). I will add here, though, that even if we implement `all_equal` as a method on `equal`, it would still be useful to have a signature that can actually describe it. -- Marten [1] https://github.com/numpy/numpy/pull/11175/files Generalized ufunc signatures now allow fixed-size dimensions ------------------------------------------------------------ By using a numerical value in the signature of a generalized ufunc, one can indicate that the given function requires input or output to have dimensions with the given size. E.g., the signature of a function that converts a polar angle to a two-dimensional cartesian unit vector would be ``()->(2)``; that for one that converts two spherical angles to a three-dimensional unit vector would be ``(),()->(3)``; and that for the cross product of two three-dimensional vectors would be ``(3),(3)->(3)``. Note that to the elementary function these dimensions are not treated any differently from variable ones indicated with a letter; the loop still is passed the corresponding size, but it can now count on that being equal to the fixed size given in the signature. Generalized ufunc signatures now allow flexible dimensions ---------------------------------------------------------- Some functions, in particular numpy's implementation of ``@`` as ``matmul``, are very similar to generalized ufuncs in that they operate over core dimensions, but one could not present them as such because they were able to deal with inputs in which a dimension is missing. To support this, it is now allowed to postfix a dimension name with a question mark to indicate that that dimension does not necessarily have to be present. With this addition, the signature for ``matmul`` can be expressed as ``(m?,n),(n,p?)->(m?,p?)``. This indicates that if, e.g., the second operand has only one dimension, for the purposes of the elementary function it will be treated as if that input has core shape ``(n, 1)``, and the output has the corresponding core shape of ``(m, 1)``. The actual output array, however, has flexible dimension removed, i.e., it will have shape ``(..., n)``. Similarly, if both arguments have only a single dimension, the inputs will be presented as having shapes ``(1, n)`` and ``(n, 1)`` to the elementary function, and the output as ``(1, 1)``, while the actual output array returned will have shape ``()``. In this way, the signature thus allows one to use a single elementary function for four related but different signatures, ``(m,n),(n,p)->(m,p)``, ``(n),(n,p)->(p)``, ``(m,n),(n)->(m)`` and ``(n),(n)->()``.
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Fri, Jun 1, 2018 at 2:42 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I do think it would be valuable to have a brief NEP on this, especially on the solution for matmul. NEPs don't have to be long, and don't need to go into the full detail of implementations. But they are a nice place to summarize design discussions. In fact, I would say the text you have below is nearly enough for one or two NEPs. The parts that are missing would be valuable to add anyways: - A brief discussion (a sentence or two) of potential broader use-cases for optional dimensions (ufuncs that act on row/column vectors and matrices). - A brief discussion of rejected alternatives (only a few sentences for each alternative).
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
For the flexible dimensions, that would be up to Nathaniel -- it's his idea ;-) And happily that means that I don't have to spend time looking up how this NEP business actually works, but can just copy & paste... -- Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
OK, I spent my Sunday morning writing a NEP. I hope this can lead to some closure... See https://github.com/numpy/numpy/pull/11297 -- Marten
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Rendered here: https://github.com/mhvk/numpy/blob/nep-gufunc-signature-enhancement/doc/neps... Eric On Sun, 10 Jun 2018 at 09:37 Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Thanks for the writeup Marten, Nathaniel: Output shape feels very similar to output dtype to me, so maybe the general way to handle this would be to make the first callback take the input shapes+dtypes and return the desired output shapes+dtypes? This hits on an interesting alternative to frozen dimensions - np.cross could just become a regular ufunc with signature np.dtype((float64, 3)), np.dtype((float64, 3)) → np.dtype((float64, 3)) Furthermore, the expansion quickly becomes cumbersome. For instance, for the all_equal signature of (n|1),(n|1)->() … I think this is only a good argument when used in conjunction with the broadcasting syntax. I don’t think it’s a reason for matmul not to have multiple signatures. Having multiple signatures is an disincentive to introduced too many overloads of the same function, which seems like a good thing to me Summarizing my overall opinions: - I’m +0.5 on frozen dimensions. The use-cases seem reasonable, and it seems like an easy-ish way to get them. Allowing ufuncs to natively support subarray types might be a tidier solution, but that could come down the road - I’m -1 on optional dimensions: they seem to legitimize creating many overloads of gufuncs. I’m already not a fan of how matmul has special cases for lower dimensions that don’t generalize well. To me, the best way to handle matmul would be to use the proposed __array_function__ to handle the shape-based special-case dispatching, either by: - Inserting dimensions, and calling the true gufunc np.linalg.matmul_2d (which is a function I’d like direct access to anyway). - Dispatching to one of four ufuncs - Broadcasting dimensions: - I know you’re not suggesting this but: enabling broadcasting unconditionally for all gufuncs would be a bad idea, masking linalg bugs. (although einsum does support broadcasting…) - Does it really need a per-dimension flag, rather than a global one? Can you give a case where that’s useful? - If we’d already made all_equal a gufunc, I’d be +1 on adding broadcasting support to it - I’m -0.5 on the all_equal path in the first place. I think we either should have a more generic approach to combined ufuncs, or just declare them numbas job. - Can you come up with a broadcasting use-case that isn’t just chaining a reduction with a broadcasting ufunc? Eric On Sun, 10 Jun 2018 at 16:02 Eric Wieser <wieser.eric+numpy@gmail.com> wrote: Rendered here:
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
In Sun, Jun 10, 2018 at 4:31 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Thanks for the writeup Marten,
Indeed, thank you Marten!
That said, I still think frozen dimension are a better proposal than either of these.
My concern with either inserting dimensions or dispatching to one of four ufuncs is that some objects (e.g., xarray.DataArray) define matrix multiplication, but in an incompatible way with NumPy (e.g., xarray sums over axes with the same name, instead of last / second-to-last axes). NumPy really ought to provide a way overload the either operation, without either inserting/removing dummy dimensions or inspecting input shapes to dispatch to other gufuncs. That said, if you don't want to make np.matmul a gufunc, then I would much rather use Python's standard overloading rules with __matmul__/__rmatmul__ than use __array_function__, for two reasons: 1. You *already* need to use __matmul__/__rmatmul__ if you want to support matrix multiplication with @ on your class, so __array_function__ would be additional and redundant. __array_function__ is really intended as a fall-back, for cases where there is no other alternative. 2. With the current __array_function__ proposal, this would imply that calling other unimplemented NumPy functions on your object would raise TypeError rather than doing coercion. This sort of additional coupled behavior is probably not what an implementor of operator.matmul/@ is looking for. In summary, I would either support: 1. (This proposal) Adding additional optional dimensions to gufuncs for np.matmul/operator.matmul, or 2. Making operator.matmul a special case for mathematical operators that always checks overloads with __matmul__/__rmatmul__ even if __array_ufunc__ is defined. Either way, matrix-multiplication becomes somewhat of a special case. It's just a matter of whether it's a special case for gufuncs (using optional dimensions) or a special case for arithmetic overloads in NumPy (not using __array_ufunc__). Given that I think optional dimensions have other conceivable uses in gufuncs (for row/column vectors), I think that's the better option. I would not support either expand dimensions or dispatch to multiple gufuncs in NumPy's implementation of operator.matmul (i.e., ndarray.__matmul__). We could potentially only do this for numpy.matmul rather than operator.matmul/@, but that opens the door to potential inconsistency between the NumPy version of an operator and Python's version of an operator, which is something we tried very hard to avoid with __arary_ufunc__.
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
I don’t understand your alternative here. If we overload np.matmul using *array_function*, then it would not use *ether* of these options for writing the operation in terms of other gufuncs. It would simply look for an *array_function* attribute, and call that method instead. Let me explain that suggestion a little more clearly. 1. There’d be a linalg.matmul2d that performs the real matrix case, which would be easy to make as a ufunc right now. 2. __matmul__ and __rmatmul__ would just call np.matmul, as they currently do (for consistency between np.matmul and operator.matmul, needed in python pre-@-operator) 3. np.matmul would be implemented as: @do_array_function_overridesdef matmul(a, b): if a.ndim != 1 and b.ndim != 1: return matmul2d(a, b) elif a.ndim != 1: return matmul2d(a, b[:,None])[...,0] elif b.ndim != 1: return matmul2d(a[None,:], b) else: # this one probably deserves its own ufunf return matmul2d(a[None,:], b[:,None])[0,0] 4. Quantity can just override __array_ufunc__ as with any other ufunc 5. DataArray, knowing the above doesn’t work, would implement something like @matmul.register_array_function(DataArray)def __array_function__(a, b): if a.ndim != 1 and b.ndim != 1: return matmul2d(a, b) else: # either: # - add/remove dummy dimensions in a dataarray-specific way # - downcast to ndarray and do the dimension juggling there Advantages of this approach: - Neither the ufunc machinery, nor __array_ufunc__, nor the inner loop, need to know about optional dimensions. - We get a matmul2d ufunc, that all subclasses support out of the box if they support matmul Eric
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Mon, Jun 11, 2018 at 11:59 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
OK, this sounds pretty reasonable to me -- assuming we manage to figure out the __array_function__ proposal! There's one additional ingredient we would need to make this work well: some way to guarantee that "ndim" and indexing operations are available without casting to a base numpy array. For now, np.asanyarray() would probably suffice, but that isn't quite right (e.g., this would fail for np.matrix). In the long term, I think we need a new coercion protocol for "duck" arrays. Nathaniel Smith and I started writing a NEP on this, but it isn't quite ready yet.
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi All, Matti asked me to make a PR accepting my own NEP - https://github.com/numpy/numpy/pull/11429 Any objections? As noted in my earlier summary of the discussion, in principle we can choose to accept only parts, although I think it became clear that the most contentious is also the one arguably most needed, the flexible dimensions for matmul. Moving forward has the advantage that in 1.16 we will actually be able to deal with matmul. All the best, Marten On Fri, Jun 15, 2018 at 2:17 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi All, In case it was missed because people have tuned out of the thread: Matti and I proposed last Tuesday to accept NEP 20 (on coming Tuesday, as per NEP 0), which introduces notation for generalized ufuncs allowing fixed, flexible and broadcastable core dimensions. For one thing, this will allow Matti to finish his work on making matmul a gufunc. See http://www.numpy.org/neps/nep-0020-gufunc-signature-enhancement.html All the best, Marten ---------- Forwarded message ---------- From: Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> Date: Tue, Jun 26, 2018 at 2:25 PM Subject: Re: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs To: Discussion of Numerical Python <numpy-discussion@python.org> Hi All, Matti asked me to make a PR accepting my own NEP - https://github.com/numpy/numpy/pull/11429 Any objections? As noted in my earlier summary of the discussion, in principle we can choose to accept only parts, although I think it became clear that the most contentious is also the one arguably most needed, the flexible dimensions for matmul. Moving forward has the advantage that in 1.16 we will actually be able to deal with matmul. All the best, Marten
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jun 30, 2018 at 6:51 AM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
So I still have some of the same concerns as before... For the possibly missing dimensions: matmul is really important, and making it a gufunc solves the problem of making it overridable by duck arrays (via __array_ufunc__). Also, it will help later when we rework dtypes: new dtypes will be able to implement matmul by the normal ufunc loop registration mechanism, which is much nicer than the current system where every dtype has a special-case method just for handling matmul. The ? proposal isn't the most elegant idea ever, but we've been tossing around ideas for solving these problems for a while, and so far this seems to be the least-bad one, so... sure, let's do it. For the fixed-size dimensions: this makes me nervous. It's aimed at a real use case, which is a major point in it's favor. But a few things make me wary. For input dimensions, it's sugar – the gufunc loop can already raise an error if it doesn't like the size it gets. For output dimensions, it does solve a real problem. But... only part of it. It's awkward that right now you only have a few limited ways to choose output dimensions, but this just extends the list of special cases, rather than solving the underlying problem. For example, 'np.linalg.qr' needs a much more generic mechanism to choose output shape, and parametrized dtypes will need a much more generic mechanism to choose output dtype, so we're definitely going to end up with some phase where arbitrary code gets to describe the output array. Are we going to look back on fixed-size dimensions as a quirky, redundant thing? Also, as currently proposed, it seems to rule out the possibility of using name-based axis specification in the future, right? (See https://github.com/numpy/numpy/pull/8819#issuecomment-366329325) Are we sure we want to do that? If everyone else is comfortable with all these things then I won't block it though. For broadcasting: I'm sorry, but I think I'm -1 on this. I feel like it falls into a classic anti-pattern in numpy, where someone sees a cool thing they could do and then goes looking for problems to justify it. (A red flag for me is that "it's easy to implement" keeps being mentioned as justification for doing it.) The all_equal and weighted_mean examples both feel pretty artificial -- traditionally we've always implemented these kinds of functions as regular functions that use (g)ufuncs internally, and it's worked fine (cf. np.allclose, ndarray.mean). In fact in some sense the whole point of numpy is to help people implement functions like this, without having to write their own gufuncs. Is there some reason these need to be gufuncs? And if there is, are these the only things that need to be gufuncs, or is there a broader class we're missing? The design just doesn't feel well-justified to me. And in the past, when we've implemented things like this, where the use cases are thin but hey why not it's easy to do, it's ended up causing two problems: first people start trying to force it into cases where it doesn't quite work, which makes everyone unhappy... and then when we eventually do try to solve the problem properly, we end up having to do elaborate workarounds to keep the old not-quite-working use cases from breaking. I'm pretty sure we're going to end up rewriting most of the ufunc code over the next few years as we ramp up duck array and user dtype support, and it's already going to be very difficult, both to design in the first place and then to implement while carefully keeping shims to keep all the old stuff working. Adding features has a very real cost, because it adds extra constraints that all this future work will have to work around. I don't think this meets the bar. By the way, I also think we're getting well past the point where we should be switching from a string-based DSL to a more structured representation. (This is another trap that numpy tends to fall into... the dtype "language" is also a major offender.) This isn't really a commentary on any part of this in particular, but just something that I've been noticing and wanted to mention :-). -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Nathaniel, Thanks for the detailed thoughts. On Tue, Jul 3, 2018 at 4:27 AM, Nathaniel Smith <njs@pobox.com> wrote:
Indeed, I only became recently aware of the ->dot member of the dtype struct. Pretty ugly!
Yes, I think this ugliness is the price we pay for matmul not just meaning matrix multiplication but also no-broadcast vector-matrix, matrix-vector, and vector-vector. (Of course, my after-the-fact annoyance with that design does make me more sensitive to the argument that one should not try to shoehorn gufuncs into cases they are not meant for.)
Can it? I would never have thought that the inner loop would need to do any checking; it is certainly not obvious from the code or from the documentation. Does the iterator itself check for errors every iteration? If so, that might be one place for a quick speed-up for ufunc.at...
I think this is a much rarer case, which will indeed need some type of hook no matter what. (I have been thinking about solving that by moving the call to the type resolver earlier. That one gets passed `op`, so it can create an output array if it doesn't exist; it doesn't need any of the sizes.)
For this particular case, I find the signature so much clearer that that in itself is a reason to do it (readability counts and all that).
I think it only excludes having the choice of keying the dict with the default axis number *or* with its dimension name, which I think would in fact be a good idea: if one uses a dict to describe axes entries, the keys should be the names of the axes, where name can be one of the fixed numbers. (Actually, strictly, since the default axis number is always negative, one can even have both. But that would be awful.) Should add that I don't think the scheme will work all that well anyway - there are quite a few cases where one duplicates a name (say, a square matrix (n,n)), for which keying by "n" would be less than useful.
Those are all fair points. For all_equal one cannot really write a separate function easily given the absence of short-circuiting. But possibly that just argues for the need to short-circuit...
Given my sentiments about the multiple meanings of `@`, I'm sensitive to this argument. But as Matti pointed out, we do not have the accept the whole NEP. Indeed, the broadcasting is in a separate PR.
I think by now it is clear that moving incrementally is the way to go; the ufunc code is in fact being rewritten, if slowly.
Yet, dtype('(3,4)f8') is really clear, unlike all the other forms... It is similar to the string formatting mini-language. Anyway, that is a bit off topic. Overall, would one way to move forward be to merge the first PR (flexible and frozen) and defer the broadcastable dimensions? All the best, Marten p.s. I'm amused that the broadcastable dimensions were in fact the default originally. At some point, I should try to find out why that default was changed.
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Tue, Jul 3, 2018 at 7:13 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Overall, would one way to move forward be to merge the first PR (flexible and frozen) and defer the broadcastable dimensions?
This would have my support. I have similar misgivings about broadcastable dimensions to those raised by Nathaniel. In particular, I wonder if there is some way to make use of this functionality internally in NumPy for functions like all_equal() without exposing it as part of the external gufunc API.
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
On Tue, Jul 3, 2018 at 12:11 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
OK, so let me explicitly ask whether there are any objections to going forward with flexible and frozen dimensions, but deferring on broadcastable ones until more compelling use cases have been identified? Thanks, Marten p.s. I adjusted the "acceptance PR" to reflect this: https://github.com/numpy/numpy/pull/11429
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
As you note further down, the present proposal of just using numbers has the advantage of being clear and easy. Another (small?) advantage is that I can use `axis` to tell where my three coordinates are, rather than be stuck with having them as the last dimension. Indeed, in my trials for wrapping the Standards Of Fundamental Astronomy routines, I started with just making every 3-vector and 3x3-matrix structured arrays with the relevant single sub-array entry. That worked, but I ended up disliking the casting to and fro.
But implementation for matmul is actually considerably trickier, since the internal loop now has to check the number of distinct dimensions.
otherwise agree with Stephan as optional dimensions being the least-bad solution. Aside: do agree we should think about how to expose the `linalg` gufuncs.
dimensions... Also, it has the benefit of being clear what the function can handle by inspection of the signature, i.e., it self-documents better (one of my main arguments in favour of frozen dimensions...).
be to auto-create an inner loop that calls all the chained ufuncs loops in turn). Not sure that short-circuiting will be all that easy. I actually quite like the all_equal ufunc, but it is in part because I remember discovering how painfully slow (a==b).all() was (and still have a place where I would use it if it existed). And it does fit in the (admittedly vague) plans to try to make `.reduce` a gufunc.
such functions... Absent a mechanism to chain ufuncs, more complicated gufuncs are currently the easiest way to get fast more complicated algebra. But perhaps a putative weighted_mean(y, sigma) -> mean, sigma_mean is a decent example? Its signature would be (n),(n)->(),() but then you're forced to give individual sigmas for each point. With (n|1),(n|1)->(),() you are no longer forced to do that (though the case of all y being the same is less than useful here... I did at some point have an implementation that worked by core dimension of each argument, but ended up feeling it was not worth the extra complication) -- Marten
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Frozen dimensions: I started with just making every 3-vector and 3x3-matrix structured arrays with the relevant single sub-array entry I was actually suggesting omitting the structured dtype (ie, field names) altogether, and just using the subarray dtypes (which exist alone, but not in arrays). Another (small?) advantage is that I can use `axis This is a fair argument against my proposal - at any rate, I think we’d need a better story for subarray dtypes before trying to add support to them for ufuncs ------------------------------ Broadcasting dimensions But perhaps a putative weighted_mean … is a decent example That’s fairly convincing as a non-chained ufunc case. Can you add an example like that to the NEP? Also, it has the benefit of being clear what the function can handle by inspection of the signature Is broadcasting (n),(n)->(),() less clear that (n|1),(n|1)->(),()? Can you come up with an example where only some dimensions make sense to broadcast? ------------------------------ Eric On Mon, 11 Jun 2018 at 08:04 Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
On Tue, Jun 12, 2018 at 2:35 AM, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Yes, I've been wondering about the point of the sub-arrays... They seem interesting but in arrays just disappear. Their possible use would be to change the shape as seen by the outside world (as happens if one does define that sub-array as a 1-part structured array). Anyway, for another discussion!
Done.
Not a super-convincing one, though I guess one could think of a similar function for 3-vectors (which somehow must care about those being three-dimensional, because, say, it calculates the average direction of the cross product in spherical angles...), then, in the signature `(n,3),(n,3)->(),(),(),()` one would like to indicate that the `n` could be broadcast, but the `3` could not. As I now write in the NEP, part of the reason of doing it by distinct dimension is that I already need a flag for flexible, so it is easy to add one for broadcastable; similarly, in the actual code, there is quite a bit of shared stuff. -- Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi All, The discussion on the gufunc signature enhancements seems to have stalled a bit, but while it was going I've tried to update the NEP correspondingly. The NEP is now merged, so can viewed more easily, at http://www.numpy.org/neps/nep-0020-gufunc-signature-enhancement.html My own quite possibly biased summary of the discussion so far is that: 1) Frozen dimensions are generally seen as a good idea; other implementations may be possible, but are not as clear. 2) Flexible dimensions have little use beyond matmul; the main discussion is whether there is a better way. In my opinion, the main benefit of the current proposal is that it allows operator overrides to all work the same way (via __array_ufunc__), independent of any assumptions about the object that does the override (such as that it has a shape). 3) Broadcastable dimensions had less support, but mostly for lack of examples; there now is one beyond all_equal, for which a gufunc is more clearly the proper route: a weighted average (which has obvious extensions). A general benefit of course is that there is actual code for all three; it would certainly be nice if we could fully support `matmul` and `@` in 1.16. So, the question would seem whether the NEP should be accepted or rejected (with part acceptance of course being possible, though I note that flexible and broadcastable share a lot of implementation, so in my opinion it is somewhat pointless to do just one of them). All the best, Marten
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Wed, May 30, 2018 at 11:14 AM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
I'm currently -0.5 on both fixed dimensions and this broadcasting dimension idea. My reasoning is: - The use cases seem fairly esoteric. For fixed dimensions, I guess the motivating example is cross-product (are there any others?). But would it be so bad for a cross-product gufunc to raise an error if it receives the wrong number of dimensions? For this broadcasting case... well, obviously we've survived this long without all_equal :-). And there's something funny about all_equal, since it's really smushing together two conceptually separate gufuncs for efficiency. Should we also have all_less_than, sum_square, ...? If this is a big problem, then wouldn't it be better to solve it in a general way, like dask or Numba or numexpr do? To be clear, I'm not saying these features are necessarily *bad* ideas, in isolation -- just that the benefits aren't very convincing, and there are trade-offs, like: - When it comes to the core ufunc machinery, we have a limited complexity budget. I'm nervous that if we add too many bells and whistles, we'll end up writing ourselves into a corner where we have trouble maintaining it, where it becomes difficult to predict how different features interact, it becomes increasingly difficult for third-parties to handle all the different features in their __array_ufunc__ methods... - And, we have a lot of other demands on the core ufunc machinery, that might be better places to spend our limited complexity budget. For example, can we come up with an extension to make np.sort a gufunc? That seems like a much higher priority than figuring out how to make all_equal a gufunc. What about refactoring the ufunc machinery to support user-defined dtypes? That'll need some serious work, and again, it's probably higher priority than supporting cross-product or all_equal directly (or at least it seems that way to me). Maybe there are more compelling use cases that I'm missing, but as it is, I feel like trying to add too many features to the current ufunc machinery is pretty risky for future maintainability, and we shouldn't do it without really solid use cases. -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Nathaniel, I think the case for frozen dimensions is much more solid that just `cross1d` - there are many operations that work on size-3 vectors. Indeed, as I noted in the PR, I have just been wrapping a Standards-of-Astronomy library in gufuncs, and many of its functions require size-3 vectors or 3x3 matrices [1]. Of course, I can put checks on the sizes, and I've now done that in a custom type resolver (which I needed anyway since, as you say, user dtypes is currently not easy), but there is a real problem for functions that take scalars and produce vectors: with a signature like `(),()->(n)`, I am forced to pass in an output with size 3, which is very inconvenient (especially if I then also want to override with `__array_ufunc__` - now my Quantity implementation also has to start changing an output already put in. So, having frozen dimensions is definitely helpful for developers of new gufuncs. Furthermore, with frozen dimensions, the signature is not just immediately clear, `(),()->(3)` for the example above, it is also better in telling users about what a function does. Indeed, I think this addition has much more justification than the `?` which is much more complex than the fixed size, yet neither particularly clear nor useful beyond the single purpose of matmul. (It is just that that single purpose has fairly high weight...) As for broadcasting, well, with the flexible dimensions defined, the *additional* complexity is very small. I have no ready example other than all_equal, though will say that I currently have code that does `if a[0] == x && a[1] == x && a[2] ==x && np.all(a[3:] == x):` just because the short-circuiting is well worth the time (it is unlikely in this context that all a equals x). So, there is at least one example of an actual need for this function. All the best, Marten [1] https://github.com/astropy/astropy/pull/7502
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Thu, May 31, 2018 at 4:21 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I agree that the use-cases for frozen dimensions are well motivated. It's not as common as writing code that supports arbitrary dimensions, but given that the real world is three dimensional it comes up with some regularity. Certainly for these use-cases it would add significant values (not requiring pre-allocation of output arrays). Furthermore, with frozen dimensions, the signature is not just
immediately clear, `(),()->(3)` for the example above, it is also better in telling users about what a function does.
Yes, frozen dimensions really do feel like a natural fit. There is no ambiguity about what an integer means in a gufunc signature, so the complexity of the gufunc model (for users and __array_ufunc__ implementors) would remain roughly fixed. In contrast, broadcasting would certainly increase the complexity of the model, as evidenced by the new syntax we would need. This may or may not be justified. Currently I am at -0.5 along with Nathaniel here.
Agreed, though at least in principle there is the slightly broader use of case of handling arguments that are either matrices or column/row vectors.
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Thu, May 31, 2018 at 4:20 AM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
Ah, this does sound like I'm missing something. I suspect this is a situation where we have two problems: - For some people the use cases are everyday and obvious; for others they're things we've never heard of (what's a "standard of astronomy"?) - The discussion is scattered around mailing list posts, old comments on random github issues, etc. This makes it hard for everyone to be on the same page. But this is exactly the situation where NEPs are useful. Maybe you could write up a short NEP for frozen dimensions? It doesn't need to be fancy or take long, but I think it'd be useful to have a single piece of text we can all look at that describes the use cases and how frozen dimensions help. BTW, regarding output shape: as you hint, there's a similar problem with parametrized dtypes in general. Consider defining a loop for np.add that lets it concatenate strings. If the inputs are S4 and S5, then the output should be S9 – but how does the ufunc machinery know that? This suggests that when we do the big refactor to ufuncs to support user-defined and parametrized dtypes in general, one of the things we'll need is a way for an individual loop to select the output dtype. One natural way to do this would be to have two callbacks per loop: one that receives the input dtypes, and returns the output dtypes, and then the other that's like the current loop callback that actually performs the operation. Output shape feels very similar to output dtype to me, so maybe the general way to handle this would be to make the first callback take the input shapes+dtypes and return the desired output shapes+dtypes? Maybe frozen dimensions are a good idea regardless, but just wanted to put that out there since it might be a more general solution.
Yeah, that's why I'm not 100% happy with '?' either (even though I proposed it in the first place :-)). But matmul is like, arguably the single most important operation in numpy, so it can justify a lot more... -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 05/31/2018 12:10 AM, Nathaniel Smith wrote:
I have often wished numpy had these short-circuiting gufuncs, for a very long time. I specifically remember my fruitless searches for how to do it back to 2007. While "on average" short-circuiting only gives a speedup of 2x, in many situations you can arrange your algorithm so short circuiting will happen early, eg usually in the first 10 elements of a 10^6 element array, giving enormous speedups. Also, I do not imagine these as free-floating ufuncs, I think we can arrange them in a logical way in a gufunc ecosystem. There would be some "core ufuncs", with "associated gufuncs" accessible as attributes. For instance, any_less_than will be accessible as less.any binary "comparison" ufuncs would have attributes less.any less.all less.first # returns first matching index less.count # counts matches without intermediate bool array This adds on to the existing attributes, for instance ufuncs already have: add.reduce add.accumulate add.reduceat add.outer add.at It is unfortunate that all ufuncs currently have these attributes even if they are unimplemented/inappropriate (eg, np.sin.reduce), I would like to remove the inappropriate ones, so each core ufunc will only have the appropriate attribute "associated gufuncs". Incidentally, once we make reduce/accumuate/... into "associated gufuncs", I propose completely removing the "method" argument of __array_ufunc__, since it is no longer needed and adds a lot of complexity which implementors of an __array_ufunc__ are forced to account for. Cheers, Allan
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
<snip>
So then, why is it a gufunc and not an attribute using a ufunc with binary output? I have asked this before, and even got arguments as to why it fits gufuncs better, but frankly I still do not really understand. If it is an associated gufunc, why gufunc at all? We need any() and all() here, so that is not that many methods, right? And when it comes to buffering you have much more flexibility. Say I have the operation: (float_arr > int_arr).all(axis=(1, 2)) With int_arr being shaped (2, 1000, 1000) (i.e. large along the interesting axes). A normal gufunc IIRC will get the whole inner dimension as a float buffer. In other words, you gain practically nothing, because the whole int_arr will be cast to float anyway. If, however, you actually implement np.greater_than.all(float_arr, int_arr, axis=(1, 2)) as a separate ufunc method, you would have the freedom to work in the typical cache friendly buffersize chunk size for each of the outer dimensions one at a time. A gufunc would require to say: please do not buffer for me, or implement all possible type combinations to do this. (of course there are memory layout subtleties, since you would have to optimize always for the "fast exit" case, potentially making the worst case scenario much worse -- unless you do seriously fancy stuff anyway). A more general question is actually whether we should rather focus on solving the same problem more generally. For example if `numexpr` would implement all/any reductions, it may be able to pretty simply get the identical tradeoffs with even more flexibility! (I have to admit, it may get tricky with multiple reduction dimensions, etc.) - Sebastian
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 05/31/2018 09:53 AM, Sebastian Berg wrote:
Hmm, I hadn't known/considered the limitations of gufunc buffer sizes. I was just thinking of them as a standardized interface which handles the where/out/broadcasting for you. I'll have to read about it. One thing I don't like about the ufunc-method strategy is that it esily pollutes all the ufuncs namespaces and their implementations, so many ufuncs have to account for a new "all" method even if innapropriate, for example. Cheers, Allan
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
<snip>
For Quantity at least I found it somewhat helpful to have the method separate from the ufunc, as how one deals with it still has similarities. Though it would probably have been similarly easy if `__array_ufunc__` had been passed `np.add.reduce` directly. Aside: am not sure we can so easily remove `method` any more, but I guess we can at least start getting to a state where it always is `__call__`. Anyway, this does argue we should be rather careful with signatures.. In particular, for `__array_function__` we really need to think carefully about `types` - an `__array_function__` specific dict at the end may be more useful. All the best, Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Sebastian, This is getting a bit far off-topic (which is whether it is a good idea to allow the ability to set frozen dimensions and broadcasting), but on `all_equal`, I definitely see the point that a method might be better, but that needs work: to expand the normal ufunc mechanism to allow the inner loop to carry state (for which there is an issue [1]) and for it to tell the iterator to stop feeding it buffers. It is not quite clear to me that this is better/easier than expanding gufuncs to be able to deal with buffers... All the best, Marten [1] https://github.com/numpy/numpy/issues/8773
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
While "on average" short-circuiting only gives a speedup of 2x, in many situations you can arrange your algorithm so short circuiting will happen early, eg usually in the first 10 elements of a 10^6 element array, giving enormous speedups. Also, I do not imagine these as free-floating ufuncs, I think we can arrange them in a logical way in a gufunc ecosystem. There would be some "core ufuncs", with "associated gufuncs" accessible as attributes. For instance, any_less_than will be accessible as less.any binary "comparison" ufuncs would have attributes less.any less.all less.first # returns first matching index less.count # counts matches without intermediate bool array This adds on to the existing attributes, for instance ufuncs already have: add.reduce add.accumulate add.reduceat add.outer add.at It is unfortunate that all ufuncs currently have these attributes even if they are unimplemented/inappropriate (eg, np.sin.reduce), I would like to remove the inappropriate ones, so each core ufunc will only have the appropriate attribute "associated gufuncs". I’m definitely in favour of all this. It’d be great to have this, and it’d be an excellent ecosystem. I’ll add that composing ufuncs is something I’ve wanted, and that has come up from time to time. Incidentally, once we make reduce/accumuate/... into "associated gufuncs", I propose completely removing the "method" argument of __array_ufunc__, since it is no longer needed and adds a lot of complexity which implementors of an __array_ufunc__ are forced to account for. While removing ‘method’ is okay in my book, there should at least be a way to detect if something is e.g., a reduction, or an element-wise ufunc (this one is obvious, all shapes involved will be ()). We, for example, use this in pydata/sparse. As you can imagine, for sparse arrays, element-wise operations behave a certain way and there turns out to be a general way to do reductions that have certain properties as well. See my paper’s draft[1] for details. I don’t mind the __array_ufunc__ api changing, but I’d like it if there was a way to still access the information that was previously available. [1] https://github.com/scipy-conference/scipy_proceedings/pull/388 Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 05/31/2018 12:10 AM, Nathaniel Smith wrote:
Re: implenetation complexity, I just want to bring up multiple-dispatch signatures again, where the new signature syntax would just be to join some signatures together with "|", and try them in order until one works. I'm not convinced it's better myself, I just wanted to make sure we are aware of it. The translation from the current proposed syntax would be: Current Syntax Multiple-dispatch syntax (n|1),(n|1)->() <===> (n),(n)->() | (n),()->() | (),(n)->() (m?,n),(n,p?)->(m?,p?) <===> (m,n),(n,p)->(m,p) | (n),(n,p)->(p) | (m,n),(n)->(m) | (n),(n)->() Conceivably, multiple-dispatch could reduce code complexity because we don't need all the special flags like UFUNC_CORE_DIM_CAN_BROADCAST, and instead of handling special syntax for ? and | and any future syntax separately, we just need a split("|") and then loop with the old signature handling code. On the other hand the m-d signatures are much less concise and the intention is perhaps harder to read. Yet they more explicitly state which combinations are allowed, while with '?' syntax you might have to puzzle it out. Cheers, Allan
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Allan, Seeing it written out like that, I quite like the multiple dispatch signature: perhaps verbose, but clear. It does mean a different way of changing the ufunc structure, but I actually think it has the advantage of being possible without extending the structure (though that might still be needed for frozen dimensions...): currently, the relevant parts of _tagPyUFuncObject are: ``` /* 0 for scalar ufunc; 1 for generalized ufunc */ int core_enabled; /* number of distinct dimension names in signature */ int core_num_dim_ix; /* numbers of core dimensions of each argument */ int *core_num_dims; /* dimension indices in a flatted form */ int *core_dim_ixs; /* positions of 1st core dimensions of each argument in core_dim_ixs */ int *core_offsets; ``` I think this could be changed without making any serious change if we slightly repurposed `core_enabled` as meaning `number of signatures`. Then, the pointers above could become ``` int core_xxx[num_sigs][max_core_num_dim_ixs]; ``` Less easy is `core_num_dim_ix`, but that number is not really needed anyway (nor used much in the code) as it is implicitly encoded in the indices. It could quite logically become the `max_core_num_dim_ixs` above. All the best, Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
p.s. While my test case of `cube_equal` in the PR is perhaps not super-realistic, I don't know if one really wants to do multiple dispatch on something like "(o|1,n|1,m|1),(o|1,n|1,m|1)->()"... -- Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
p.p.s. Your multiple dispatch signature for broadcasted dimensions is actually not quite right: should be (n|1),(n|1)->() ===> (n),(n)->() | (n),(1)->() | (1),(n)->() | (n),() -> () | (),(n)->() This is becoming quite verbose... (and perhaps become somewhat slow). Though arguably one could always allow missing dimensions to be 1 for this case, so that one could drop the last two.
![](https://secure.gravatar.com/avatar/72f994ca072df3a3d2c3db8a137790fd.jpg?s=120&d=mm&r=g)
On 31/05/18 08:23, Allan Haldane wrote:
I am -1 on multiple signatures. We may revisit this in time, but for now I find the minimal intrusiveness of the current changes appealing, especially as it requires few to no changes whatsoever to the inner loop function. Multiple dispatch could easily break that model by allowing very different signatures to be aggregated into a single ufunc, leading to unhandled edge cases and strange segfaults. It also seems to me that looping over all signatures might slow down ufunc calling, leading to endless variations of strategies of optimizing signature ordering. Matti
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
I had actually started trying Allan's suggestion [1], and at least parsing is not difficult. But I will stop now, as I think your point about the inner loop really needing a fixed set of sizes and strides is deadly for the whole idea. (Just goes to show I should think before writing code!) As is, all the proposed changes do is fiddle with size 1 axes (well, and defining a fixed size rather than letting the operand do it), which of course doesn't matter for the inner loop. -- Marten [1] https://github.com/numpy/numpy/compare/master...mhvk:gufunc-multiple-dispatc...
participants (10)
-
Allan Haldane
-
Charles R Harris
-
Eric Wieser
-
Hameer Abbasi
-
Marten van Kerkwijk
-
Matthew Harrigan
-
Matti Picus
-
Nathaniel Smith
-
Sebastian Berg
-
Stephan Hoyer