[Numpy-discussion] performance matrix multiplication vs. matlab

Tue Jun 9 22:33:26 EDT 2009

Hi Benoit,

Benoit Jacob wrote:
> No, because _we_ are serious about compilation times, unlike other c++
> template libraries. But granted, compilation times are not as short as
> a plain C library either.
>   

I concede it is not as bad as the heavily templated libraries in boost.
But C++ is just horribly slow to compile, at least with g++ - in scipy,
half of the compilation time is spent for a couple of C++ files which
uses simple templates. And the compiler takes a lot of memory during
compilation (~ 300 Mb per file - that's a problem because I rely a lot
on VM to build numpy/scipy binaries).

> Eigen doesn't _require_ any SIMD instruction set although it can use
> SSE / AltiVec if enabled.
>   

If SSE is not enabled, my (very limited) tests show that eigen does not
perform as well as a stock debian ATLAS on the benchmarks given by
eigen. For example:

 g++ benchBlasGemm.cpp -I .. -lblas -O2 -DNDEBUG && ./a.out 300
cblas: 0.034222 (0.788 GFlops/s)
eigen : 0.0863581 (0.312 GFlops/s)
eigen : 0.121259 (0.222 GFlops/s)

g++ benchBlasGemm.cpp -I .. -lblas -O2 -DNDEBUG -msse2 && ./a.out 300
cblas: 0.035438 (0.761 GFlops/s)
eigen : 0.0182271 (1.481 GFlops/s)
eigen : 0.0860961 (0.313 GFlops/s)

(on a PIV, which may not be very representative of current architectures)

> It is true that with Eigen this is set up at build time, but this is
> only because it is squarely _not_ Eigen's job to do runtime platform
> checks. Eigen isn't a binary library. If you want a runtime platform
> switch, just compile your critical Eigen code twice, one with SSE one
> without, and do the platform check in your own app. The good thing
> here is that Eigen makes sure that the ABI is independent of whether
> vectorization is enabled.
>   

I understand that it is not a goal of eigen, and that should be the
application's job. It is just that MKL does it automatically, and doing
it in a cross platform way in the context of python extensions is quite
hard because of various linking strategies on different OS.

cheers,

David