10 Jun 2014 10 Jun '14
On 10/06/14 14:57, Matthew Brett wrote:
Would you consider doing a PR for that?
Here is a patch you can try before I post a PR.
It can also be build independently of NumPy, so you don't need to rebuild NumPy just for testing it. (The change to numpy is in a different folder.)
I decided against using cblas_sgemm. Instead it just enforces alignment to 32 byte boundaries. Because cblas_sgemm would require a copy if the vector is strided, it didn't matter.
I have tested with Accelerate, OpenBLAS and MKL, clang and icc. From what I can tell it works correctly and does not segfault on misalignment.