On Wed, Jul 21, 2021 at 9:38 PM Nicholai Tukanov <nicholaitukanov@gmail.com> wrote:
I would like to understand how to go about extending the SIMD framework in order to add support for POWER10. Specifically, I would like to add the following instructions: `lxvp` and `stxvp` which loads/stores 256 bits into/from two vectors. I believe that this will be able to give a decent performance boost for those on POWER machines since it can halved the amount of loads/stores issued.

Thanks for proposing this Nicholai. Hopefully someone more knowledgeable than me can point out how to go about this.


Additionally, matrix engines (2-D SIMD instructions) are becoming quite popular due to their performance improvements for deep learning and scientific computing. Would it be beneficial to add these new advanced SIMD instructions into the framework or should these instructions be left to libraries such as OpenBLAS and MKL?

 This is indeed best left to OpenBLAS, MKL et al.

Cheers,
Ralf