[Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)
jtaylor.debian at googlemail.com
Fri Apr 11 14:38:12 EDT 2014
On 11.04.2014 19:05, Sturla Molden wrote:
> Sturla Molden <sturla.molden at gmail.com> wrote:
>> Making a totally new BLAS might seem like a crazy idea, but it might be the
>> best solution in the long run.
> To see if this can be done, I'll try to re-implement cblas_dgemm and then
> benchmark against MKL, Accelerate and OpenBLAS. If I can get the
> performance better than 75% of their speed, without any assembly or dark
> magic, just plain C99 compiled with Intel icc, that would be sufficient for
> binary wheels on Windows I think.
if you can, also give gcc with graphite a try. Its loop transformations
should give you similar results as manual blocking if the compiler is
able to understand the loop, see
+ a couple options to tune the parameters
you may need gcc-4.8 for it to work properly on not compile time fixed
loop iteration counts.
So far i know clang/llvm also has graphite integration.
More information about the NumPy-Discussion