Hi all, Looking at the ufunc dispatching rules with an `out` argument, I was a bit surprised to realize this little gem is how things work: ``` arr = np.arange(10, dtype=np.uint16) + 2**15 print(arr) # array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], dtype=uint16) out = np.zeros(10) np.add(arr, arr, out=out) print(repr(out)) # array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18.]) ``` This is strictly speaking correct/consistent. What the ufunc tries to ensure is that whatever the loop produces fits into `out`. However, I still find it unexpected that it does not pick the full precision loop. There is currently only one way to achieve that, and this by using `dtype=out.dtype` (or similar incarnations) which specify the exact dtype [0]. Of course this is also because I would like to simplify things for a new dispatching system, but I would like to propose to disable the above behaviour. This would mean: ``` # make the call: np.add(arr, arr, out=out) # Equivalent to the current [1]: np.add(arr, arr, out=out, dtype=(None, None, out.dtype)) # Getting the old behaviour requires (assuming inputs have same dtype): np.add(arr, arr, out=out, dtypes=arr.dtype) ``` and thus force the high precision loop. In very rare cases, this could lead to no loop being found. The main incompatibility is if someone actually makes use of the above (integer over/underflow) behaviour, but wants to store it in a higher precision array. I personally currently think we should change it, but am curious if we think that we may be able to get away with an accelerate process and not a year long FutureWarning. Cheers, Sebastian [0] You can also use `casting="no"` but in all relevant cases that should find no loop, since the we typically only have homogeneous loop definitions, and [1] Which is normally the same as the shorter spelling `dtype=out.dtype` of course.