[Numpy-discussion] Fwd: Allowing broadcasting of code dimensions in generalized ufuncs

Tue Jul 3 10:10:54 EDT 2018

Hi Nathaniel,

Thanks for the detailed thoughts.

On Tue, Jul 3, 2018 at 4:27 AM, Nathaniel Smith <njs at pobox.com> wrote:

> On Sat, Jun 30, 2018 at 6:51 AM, Marten van Kerkwijk
> <m.h.vankerkwijk at gmail.com> wrote:
> > Hi All,
> >
> > In case it was missed because people have tuned out of the thread: Matti
> and
> > I proposed last Tuesday to accept NEP 20 (on coming Tuesday, as per NEP
> 0),
> > which introduces notation for generalized ufuncs allowing fixed, flexible
> > and broadcastable core dimensions. For one thing, this will allow Matti
> to
> > finish his work on making matmul a gufunc.
> >
> > See http://www.numpy.org/neps/nep-0020-gufunc-signature-enhancement.html
>
> So I still have some of the same concerns as before...
>
> For the possibly missing dimensions: matmul is really important, and
> making it a gufunc solves the problem of making it overridable by duck
> arrays (via __array_ufunc__). Also, it will help later when we rework
> dtypes: new dtypes will be able to implement matmul by the normal
> ufunc loop registration mechanism, which is much nicer than the
> current system where every dtype has a special-case method just for
> handling matmul.

Indeed, I only became recently aware of the ->dot member of the dtype
struct. Pretty ugly!

> The ? proposal isn't the most elegant idea ever, but
> we've been tossing around ideas for solving these problems for a
> while, and so far this seems to be the least-bad one, so... sure,
> let's do it.
>

Yes, I think this ugliness is the price we pay for matmul not just meaning
matrix multiplication but also no-broadcast vector-matrix, matrix-vector,
and vector-vector. (Of course, my after-the-fact annoyance with that design
does make me more sensitive to the argument that one should not try to
shoehorn gufuncs into cases they are not meant for.)

>
> For the fixed-size dimensions: this makes me nervous. It's aimed at a
> real use case, which is a major point in it's favor. But a few things
> make me wary. For input dimensions, it's sugar – the gufunc loop can
> already raise an error if it doesn't like the size it gets.

Can it? I would never have thought that the inner loop would need to do any
checking; it is certainly not obvious from the code or from the
documentation. Does the iterator itself check for errors every iteration?
If so, that might be one place for a quick speed-up for ufunc.at...

> For output
> dimensions, it does solve a real problem. But... only part of it. It's
> awkward that right now you only have a few limited ways to choose
> output dimensions, but this just extends the list of special cases,
> rather than solving the underlying problem. For example,
> 'np.linalg.qr' needs a much more generic mechanism to choose output
> shape, and parametrized dtypes will need a much more generic mechanism
> to choose output dtype, so we're definitely going to end up with some
> phase where arbitrary code gets to describe the output array.

I think this is a much rarer case, which will indeed need some type of hook
no matter what.
(I have been thinking about solving that by moving the call to the type
resolver earlier. That one gets passed `op`, so it can create an output
array if it doesn't exist; it doesn't need any of the sizes.)

> Are we
> going to look back on fixed-size dimensions as a quirky, redundant
> thing?
>

For this particular case, I find the signature so much clearer that that in
itself is a reason to do it (readability counts and all that).

>
> Also, as currently proposed, it seems to rule out the possibility of
> using name-based axis specification in the future, right? (See
> https://github.com/numpy/numpy/pull/8819#issuecomment-366329325) Are
> we sure we want to do that?
>

I think it only excludes having the choice of keying the dict with the
default axis number *or* with its dimension name, which I think would in
fact be a good idea: if one uses a dict to describe axes entries, the keys
should be the names of the axes, where name can be one of the fixed numbers.
(Actually, strictly, since the default axis number is always negative, one
can even have both. But that would be awful.)

Should add that I don't think the scheme will work all that well anyway -
there are quite a few cases where one duplicates a name (say, a square
matrix (n,n)), for which keying by "n" would be less than useful.

> If everyone else is comfortable with all these things then I won't
> block it though.
>
> For broadcasting: I'm sorry, but I think I'm -1 on this. I feel like
> it falls into a classic anti-pattern in numpy, where someone sees a
> cool thing they could do and then goes looking for problems to justify
> it. (A red flag for me is that "it's easy to implement" keeps being
> mentioned as justification for doing it.) The all_equal and
> weighted_mean examples both feel pretty artificial -- traditionally
> we've always implemented these kinds of functions as regular functions
> that use (g)ufuncs internally, and it's worked fine (cf. np.allclose,
> ndarray.mean). In fact in some sense the whole point of numpy is to
> help people implement functions like this, without having to write
> their own gufuncs. Is there some reason these need to be gufuncs? And
> if there is, are these the only things that need to be gufuncs, or is
> there a broader class we're missing? The design just doesn't feel
> well-justified to me.
>

Those are all fair points. For all_equal one cannot really write a separate
function easily given the absence of short-circuiting. But possibly that
just argues for the need to short-circuit...

> And in the past, when we've implemented things like this, where the
> use cases are thin but hey why not it's easy to do, it's ended up
> causing two problems: first people start trying to force it into cases
> where it doesn't quite work, which makes everyone unhappy... and then
> when we eventually do try to solve the problem properly, we end up
> having to do elaborate workarounds to keep the old not-quite-working
> use cases from breaking.
>

Given my sentiments about the multiple meanings of `@`, I'm sensitive to
this argument.

But as Matti pointed out, we do not have the accept the whole NEP. Indeed,
the broadcasting is in a separate PR.

>
> I'm pretty sure we're going to end up rewriting most of the ufunc code
> over the next few years as we ramp up duck array and user dtype
> support, and it's already going to be very difficult, both to design
> in the first place and then to implement while carefully keeping shims
> to keep all the old stuff working. Adding features has a very real
> cost, because it adds extra constraints that all this future work will
> have to work around. I don't think this meets the bar.
>

I think by now it is clear that moving incrementally is the way to go; the
ufunc code is in fact being rewritten, if slowly.

> By the way, I also think we're getting well past the point where we
> should be switching from a string-based DSL to a more structured
> representation. (This is another trap that numpy tends to fall into...
> the dtype "language" is also a major offender.) This isn't really a
> commentary on any part of this in particular, but just something that
> I've been noticing and wanted to mention :-).
>

Yet, dtype('(3,4)f8') is really clear, unlike all the other forms... It is
similar to the string formatting mini-language. Anyway, that is a bit off
topic.

Overall, would one way to move forward be to merge the first PR (flexible
and frozen) and defer the broadcastable dimensions?

All the best,

Marten

p.s.  I'm amused that the broadcastable dimensions were in fact the default
originally. At some point, I should try to find out why that default was
changed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180703/bdaed6a0/attachment.html>