April 11, 2014
5:05 p.m.
Sturla Molden <sturla.molden@gmail.com> wrote:
Making a totally new BLAS might seem like a crazy idea, but it might be the best solution in the long run.
To see if this can be done, I'll try to re-implement cblas_dgemm and then benchmark against MKL, Accelerate and OpenBLAS. If I can get the performance better than 75% of their speed, without any assembly or dark magic, just plain C99 compiled with Intel icc, that would be sufficient for binary wheels on Windows I think. Sturla