
Neal Becker wrote:
I've been browsing the numpy source. I'm wondering about mixed-mode arithmetic on arrays. I believe the way numpy handles this is that it never does mixed arithmetic, but instead converts arrays to a common type. Arguably, that might be efficient for a mix of say, double and float. Maybe not. But for a mix of complex and a scalar type (say, CDouble * Double), it's clearly suboptimal in efficiency. So, do I understand this correctly? If so, is that something we should improve?
Reviving this old thread - I note that numpy.dot supports in-place computation for performance reasons like this c = np.empty_like(a, order='C') np.dot(a, b, out=c) However, the data type of the pre-allocated c array must match the result datatype of a times b. Now, with some accelerator hardware (i.e. tensor cores or matrix multiplication engines in GPUs), mixed precision arithmetics with relaxed floating point precision (i.e.., which are not necessarily IEEE754 conformant) but with faster performance are possible, which could be supported in downstream libraries such as cupy. Case in point, a mixed precision calculation may take half precision inputs, but accumulate in and return full precision outputs. Due to the above mentioned type consistency, the outputs would be unnecessarily demoted (truncated) to half precision again. The current API of numpy does not expose mixed precision concepts. Therefore, it would be nice if it was possible to build in support for hardware accelerated linear algebra, even if that may not be available on the standard (CPU) platforms numpy is typically compiled for. I'd be happy to flesh out some API concepts, but would be curious to first get an opinion from others. It may be necessary to weigh the complexity of adding such support explicitly against providing minimal hooks for add-on libraries in the style of JMP (for jax.numpy), or AMP (for torch). Jens