On 8/11/22 19:45, Chris Sidebottom wrote:
Hi Matti,
Thanks for your questions :-)
This seems like it would improve performance on aarch64. Would the routines also work with the Apple silicon? Yip, I can't see a reason why that wouldn't be the case.
If these are new routines, it would be better to implement them in terms of the numpy universal intrinsics rather than adding a new submodule. These would be the same routines as seen in SVML (integrated here: https://github.com/numpy/numpy/blob/main/numpy/core/src/umath/loops_umath_fp...), which use the universal intrinsics before using the SVML library, the actual surface area is minimal so I'd propose we follow a similar path with our existing routines and then aim to apply universal intrinsics if that's possible in the future - does that sound like a good approach?
Cheers, Chris
Yes, if the routines already exist then it would seem an additional submodule of code would be the best path forward, as long as the license is compatible. Matti