Using numpy universal intrinsics in astropy

Hello all, I am investigating how to speed up specific C functions within astropy, ideally without increasing maintenance burden significantly (see discussion https://github.com/astropy/astropy/issues/16902 for annotating functions with the `target_clones` function attribute). The numpy approach for SIMD instructions and support for runtime dispatch seems appealing - what would be the recommended path to implement vectorised functions using the numpy universal intrinsics within astropy? I am hopeful that there might be a not-too-difficult-path because of the sub-section about "Reuse by other projects" in NEP 38 about SIMD optimizations. I see the WIP to add a C++ wrapper for the universal intrinsics (https://github.com/numpy/numpy/pull/21057) and the example application code added there to square an array (https://github.com/numpy/numpy/pull/21057/files#diff-cee58cafc4ff85b8fd3d174...) seems reasonably readable. Is that PR meant to be the starting point for external packages to attempt to write code with universal intrinsics? Thanks in advance! Manodeep
participants (1)
-
manodeep@gmail.com