[RFC]  numpy/SVML appears to be poorly optimized
The numpy SVML library: https://github.com/numpy/SVML appears to be poorly optimized. Since its just the raw assembly dump this also makes it quite difficult to improve (with either a better compiler or by hand). Some of the glaring issues are: 1. register allocation / spilling 2. rodata layouts / constpropagation of the values. 3. Very odd use of internal functions that really ought to be inlined. Are these functions meant to be heavily optimized? If so, are people open to patches that optimize them (either with new C implementations are in the current assembly implementations).
On Fri, Nov 5, 2021 at 1:38 PM Devulapalli, Raghuveer <raghuveer.devulapalli@intel.com> wrote:
They are meant to be optimized. Any contribution to improve them further is more than welcome.
Fantastic. I don't see any tests for any of the functions in there. Does anyone know where I can find them?
Use the main NumPy test suite by updating the svml submodule to the commit with your changes, then run the test suite the regular way (e.g. `python runtests.py`). Cheers, Ralf
Hopefully, we will able to convert them to universal intrinsics (nep38) one day. As one of the team, I will try to push more time for it. Thanks, Sayed. On Nov 6 2021, at 5:54 pm, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Use the main NumPy test suite by updating the svml submodule to the commit with your changes, then run the test suite the regular way (e.g. `python runtests.py`).
Cheers, Ralf
Note the benchmarks on Sayed's PR [0] to move tanh to universal intrinsics. It not only supplies the routines for all universalintrinsicssupported platforms, it even slightly increased performance on AVX512 (usual disclaimers about dangers of comparing benchmarks apply). Matti [0] https://github.com/numpy/numpy/pull/20363
