On Fri, Apr 11, 2014 at 6:05 PM, Sturla Molden
Making a totally new BLAS might seem like a crazy idea, but it might be the best solution in the long run.
To see if this can be done, I'll try to re-implement cblas_dgemm and then benchmark against MKL, Accelerate and OpenBLAS. If I can get the performance better than 75% of their speed, without any assembly or dark magic, just plain C99 compiled with Intel icc, that would be sufficient for binary wheels on Windows I think.
Sounds like a worthwhile experiment! My suspicion is that it we'll be better off starting with something that is almost good enough (OpenBLAS) and then incrementally improving it to meet our needs, rather than starting from scratch -- there's a *long* way to get from dgemm to a fully supported BLAS project -- but no matter what it'll generate useful data, and possibly some useful code that could either be the basis of something new or integrated into whatever we do end up doing. Also, while Windows is maybe in the worst shape, all platforms would seriously benefit from the existence of a reliable speed-competitive binary-distribution-compatible BLAS that doesn't break fork(). -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org