I would like to understand how to go about extending the SIMD framework in
order to add support for POWER10. Specifically, I would like to add the
following instructions: `lxvp` and `stxvp` which loads/stores 256 bits
into/from two vectors. I believe that this will be able to give a decent
performance boost for those on POWER machines since it can halved the
amount of loads/stores issued.
Additionally, matrix engines (2-D SIMD instructions) are becoming quite
popular due to their performance improvements for deep learning and
scientific computing. Would it be beneficial to add these new advanced SIMD
instructions into the framework or should these instructions be left to
libraries such as OpenBLAS and MKL?