[Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs

Thu May 31 18:02:52 EDT 2018

On Thu, May 31, 2018 at 4:20 AM, Marten van Kerkwijk
<m.h.vankerkwijk at gmail.com> wrote:
> Hi Nathaniel,
>
> I think the case for frozen dimensions is much more solid that just
> `cross1d` - there are many operations that work on size-3 vectors.
> Indeed, as I noted in the PR, I have just been wrapping a
> Standards-of-Astronomy library in gufuncs, and many of its functions
> require size-3 vectors or 3x3 matrices [1]. Of course, I can put
> checks on the sizes, and I've now done that in a custom type resolver
> (which I needed anyway since, as you say, user dtypes is currently not
> easy), but there is a real problem for functions that take scalars and
> produce vectors: with a signature like `(),()->(n)`, I am forced to
> pass in an output with size 3, which is very inconvenient (especially
> if I then also want to override with `__array_ufunc__` - now my
> Quantity implementation also has to start changing an output already
> put in. So, having frozen dimensions is definitely helpful for
> developers of new gufuncs.

Ah, this does sound like I'm missing something. I suspect this is a
situation where we have two problems:

- For some people the use cases are everyday and obvious; for others
they're things we've never heard of (what's a "standard of
astronomy"?)
- The discussion is scattered around mailing list posts, old comments
on random github issues, etc.

This makes it hard for everyone to be on the same page. But this is
exactly the situation where NEPs are useful. Maybe you could write up
a short NEP for frozen dimensions? It doesn't need to be fancy or take
long, but I think it'd be useful to have a single piece of text we can
all look at that describes the use cases and how frozen dimensions
help.

BTW, regarding output shape: as you hint, there's a similar problem
with parametrized dtypes in general. Consider defining a loop for
np.add that lets it concatenate strings. If the inputs are S4 and S5,
then the output should be S9 – but how does the ufunc machinery know
that? This suggests that when we do the big refactor to ufuncs to
support user-defined and parametrized dtypes in general, one of the
things we'll need is a way for an individual loop to select the output
dtype. One natural way to do this would be to have two callbacks per
loop: one that receives the input dtypes, and returns the output
dtypes, and then the other that's like the current loop callback that
actually performs the operation. Output shape feels very similar to
output dtype to me, so maybe the general way to handle this would be
to make the first callback take the input shapes+dtypes and return the
desired output shapes+dtypes? Maybe frozen dimensions are a good idea
regardless, but just wanted to put that out there since it might be a
more general solution.

> Furthermore, with frozen dimensions, the signature is not just
> immediately clear, `(),()->(3)` for the example above, it is also
> better in telling users about what a function does.
>
> Indeed, I think this addition has much more justification than the `?`
> which is much more complex than the fixed size, yet neither
> particularly clear nor useful beyond the single purpose of matmul. (It
> is just that that single purpose has fairly high weight...)

Yeah, that's why I'm not 100% happy with '?' either (even though I
proposed it in the first place :-)). But matmul is like, arguably the
single most important operation in numpy, so it can justify a lot
more...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org