[Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs

Sun Jun 10 20:52:50 EDT 2018

In Sun, Jun 10, 2018 at 4:31 PM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> Thanks for the writeup Marten,
>
Indeed, thank you Marten!

> This hits on an interesting alternative to frozen dimensions - np.cross
> could just become a regular ufunc with signature np.dtype((float64, 3)),
> np.dtype((float64, 3)) → np.dtype((float64, 3))
>
> Another alternative to mention is returning multiple arrays, e.g., two
arrays for a fixed dimension of size 2.

That said, I still think frozen dimension are a better proposal than either
of these.

>    - I’m -1 on optional dimensions: they seem to legitimize creating many
>    overloads of gufuncs. I’m already not a fan of how matmul has special cases
>    for lower dimensions that don’t generalize well. To me, the best way to
>    handle matmul would be to use the proposed __array_function__ to
>    handle the shape-based special-case dispatching, either by:
>       - Inserting dimensions, and calling the true gufunc
>       np.linalg.matmul_2d (which is a function I’d like direct access to
>       anyway).
>       - Dispatching to one of four ufuncs
>
> I don't understand your alternative here. If we overload np.matmul using
__array_function__, then it would not use *ether* of these options for
writing the operation in terms of other gufuncs. It would simply look for
an __array_function__ attribute, and call that method instead.

My concern with either inserting dimensions or dispatching to one of four
ufuncs is that some objects (e.g., xarray.DataArray) define matrix
multiplication, but in an incompatible way with NumPy (e.g., xarray sums
over axes with the same name, instead of last / second-to-last axes). NumPy
really ought to provide a way overload the either operation, without either
inserting/removing dummy dimensions or inspecting input shapes to dispatch
to other gufuncs.

That said, if you don't want to make np.matmul a gufunc, then I would much
rather use Python's standard overloading rules with __matmul__/__rmatmul__
than use __array_function__, for two reasons:
1. You *already* need to use __matmul__/__rmatmul__ if you want to support
matrix multiplication with @ on your class, so __array_function__ would be
additional and redundant. __array_function__ is really intended as a
fall-back, for cases where there is no other alternative.
2. With the current __array_function__ proposal, this would imply that
calling other unimplemented NumPy functions on your object would raise
TypeError rather than doing coercion. This sort of additional coupled
behavior is probably not what an implementor of operator.matmul/@ is
looking for.

In summary, I would either support:
1. (This proposal) Adding additional optional dimensions to gufuncs for
np.matmul/operator.matmul, or
2. Making operator.matmul a special case for mathematical operators that
always checks overloads with __matmul__/__rmatmul__ even if __array_ufunc__
is defined.

Either way, matrix-multiplication becomes somewhat of a special case. It's
just a matter of whether it's a special case for gufuncs (using optional
dimensions) or a special case for arithmetic overloads in NumPy (not using
__array_ufunc__). Given that I think optional dimensions have other
conceivable uses in gufuncs (for row/column vectors), I think that's the
better option.

I would not support either expand dimensions or dispatch to multiple
gufuncs in NumPy's implementation of operator.matmul (i.e.,
ndarray.__matmul__). We could potentially only do this for numpy.matmul
rather than operator.matmul/@, but that opens the door to potential
inconsistency between the NumPy version of an operator and Python's version
of an operator, which is something we tried very hard to avoid with
__arary_ufunc__.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/bdaf7677/attachment.html>