Hi all,
Looking at the ufunc dispatching rules with an `out` argument, I was a
bit surprised to realize this little gem is how things work:
```
arr = np.arange(10, dtype=np.uint16) + 2**15
print(arr)
# array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], dtype=uint16)
out = np.zeros(10)
np.add(arr, arr, out=out)
print(repr(out))
# array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18.])
```
This is strictly speaking correct/consistent. What the ufunc tries to
ensure is that whatever the loop produces fits into `out`.
However, I still find it unexpected that it does not pick the full
precision loop.
There is currently only one way to achieve that, and this by using
`dtype=out.dtype` (or similar incarnations) which specify the exact
dtype [0].
Of course this is also because I would like to simplify things for a
new dispatching system, but I would like to propose to disable the
above behaviour. This would mean:
```
# make the call:
np.add(arr, arr, out=out)
# Equivalent to the current [1]:
np.add(arr, arr, out=out, dtype=(None, None, out.dtype))
# Getting the old behaviour requires (assuming inputs have same dtype):
np.add(arr, arr, out=out, dtypes=arr.dtype)
```
and thus force the high precision loop. In very rare cases, this could
lead to no loop being found.
The main incompatibility is if someone actually makes use of the above
(integer over/underflow) behaviour, but wants to store it in a higher
precision array.
I personally currently think we should change it, but am curious if we
think that we may be able to get away with an accelerate process and
not a year long FutureWarning.
Cheers,
Sebastian
[0] You can also use `casting="no"` but in all relevant cases that
should find no loop, since the we typically only have homogeneous loop
definitions, and
[1] Which is normally the same as the shorter spelling
`dtype=out.dtype` of course.