April 28, 2014
10:25 a.m.
Am 11 Apr 2014 um 19:05 schrieb Sturla Molden <sturla.molden@gmail.com>:
Sturla Molden <sturla.molden@gmail.com> wrote:
Making a totally new BLAS might seem like a crazy idea, but it might be the best solution in the long run.
To see if this can be done, I'll try to re-implement cblas_dgemm and then benchmark against MKL, Accelerate and OpenBLAS. If I can get the performance better than 75% of their speed, without any assembly or dark
So what percentage on performance did you achieve so far? Cheers, Michael