The documentation for numpy.dot currently states

""" out ndarray, optional Output argument. This must have the exact kind that would be returned if it was not used. In particular, it must have the right type, must be C-contiguous, and its dtype must be the dtype that would be returned for dot(a,b). This is a performance feature. Therefore, if these conditions are not met, an exception is raised, instead of attempting to be flexible. """

I think this means that if dot(a,b) returned FP32 for FP16 inputs, it would be consistent with this API to supply a full precision output array. All that would be needed in an actual implementation is a mixed_precision flag (or output_dtype option) for this op to override the usual type

Do you agree?

`np.dot` is strange. Could you use `np.matmul` instead, which is a real ufunc and (I think) already does this?

Sort of. As currently implemented, no, it won't do what Jens wants because there is no `ee->f` loop implemented (`e` is the typecode for float16, `f` for float32). Passing float16 operands and requesting a float32 output (either with `dtype=np.float32` or providing such an `out=`) will fall down to the `ff->f` loop and cause upcasting of the operands, which is not what they want. But notionally one could add an `ee->f` loop between those two that would catch this case when `dtype=np.float32` is requested. -- Robert Kern