On Tue, Jul 11, 2023 at 10:11 AM Matti Picus <matti.picus@gmail.com> wrote:

On 10/7/23 16:13, Jens Glaser via NumPy-Discussion wrote:
> Hi Matti,
>
> The documentation for numpy.dot currently states
>
> """
> out
> ndarray, optional
> Output argument. This must have the exact kind that would be returned if it was not used. In particular, it must have the right type, must be C-contiguous, and its dtype must be the dtype that would be returned for dot(a,b). This is a performance feature. Therefore, if these conditions are not met, an exception is raised, instead of attempting to be flexible.
> """
>
> I think this means that if dot(a,b) returned FP32 for FP16 inputs, it would be consistent with this API to supply a full precision output array. All that would be needed in an actual implementation is a mixed_precision flag (or output_dtype option) for this op to override the usual type promotion rules. Do you agree?
>
> Jens

`np.dot` is strange. Could you use `np.matmul` instead, which is a real
ufunc and (I think) already does this?

Sort of. As currently implemented, no, it won't do what Jens wants because there is no `ee->f` loop implemented (`e` is the typecode for float16, `f` for float32). Passing float16 operands and requesting a float32 output (either with `dtype=np.float32` or providing such an `out=`) will fall down to the `ff->f` loop and cause upcasting of the operands, which is not what they want. But notionally one could add an `ee->f` loop between those two that would catch this case when `dtype=np.float32` is requested.

Robert Kern