Hello, Here at Arm, we've been investigating how we can improve performance on AArch64. One way in which we can improve performance is by integrating some existing optimized routines (https://github.com/ARM-software/optimized-routines), similar to the SVML methods for AVX512 that are currently included as a git submodule. Our intent is to include the optimized routines repository as an additional submodule which we can then use to provide routines on AArch64 for ASIMD, SVE and beyond. Currently, we're targeting 4-ULP as this aligns with libmvec (https://sourceware.org/glibc/wiki/libmvec) and the SVML integration (https://github.com/numpy/numpy/pull/19478). This is alongside adding sufficient error handling to pass the Numpy test suite, meeting the test requirements highlighted in the SVML integration (https://github.com/numpy/numpy/pull/19478#issuecomment-893001722). We've already started curating the necessary functions, let us know if you have any feedback. Cheers, Chris IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.