[Numpy-discussion] performance matrix multiplication vs. matlab

Benoit Jacob jacob.benoit.1 at gmail.com
Wed Jun 10 11:24:42 EDT 2009

Hi David,

2009/6/9 David Cournapeau <david at ar.media.kyoto-u.ac.jp>:
> Hi Benoit,
> Benoit Jacob wrote:
>> No, because _we_ are serious about compilation times, unlike other c++
>> template libraries. But granted, compilation times are not as short as
>> a plain C library either.
> I concede it is not as bad as the heavily templated libraries in boost.
> But C++ is just horribly slow to compile, at least with g++ - in scipy,
> half of the compilation time is spent for a couple of C++ files which
> uses simple templates. And the compiler takes a lot of memory during
> compilation (~ 300 Mb per file - that's a problem because I rely a lot
> on VM to build numpy/scipy binaries).

Well, I can't comment on other libraries that I don't know. It is true
that compilation time and memory usage in C++ templated code will
never be as low as in C compilation, and can easily go awry if the c++
programmer isn't careful. Templates are really a scripting language
for the compiler and like in any (turing complete) language you can
always write a program that takes long to "execute".

>> Eigen doesn't _require_ any SIMD instruction set although it can use
>> SSE / AltiVec if enabled.
> If SSE is not enabled, my (very limited) tests show that eigen does not
> perform as well as a stock debian ATLAS on the benchmarks given by
> eigen. For example:

Of course! The whole point is that ATLAS is a binary library with its
own SSE code, so it is still able to use SSE even if your program was
compiled without SSE enabled: ATLAS will run its own platform check at

So it's not a surprise that ATLAS with SSE is faster than Eigen without SSE.

By the way this was shown in our benchmark already:
Scroll down to matrix matrix product. The gray curve "eigen2_novec" is
eigen without SSE.

>  g++ benchBlasGemm.cpp -I .. -lblas -O2 -DNDEBUG && ./a.out 300
> cblas: 0.034222 (0.788 GFlops/s)
> eigen : 0.0863581 (0.312 GFlops/s)
> eigen : 0.121259 (0.222 GFlops/s)

and just out of curiosity, what are the 2 eigen lines ?

> g++ benchBlasGemm.cpp -I .. -lblas -O2 -DNDEBUG -msse2 && ./a.out 300
> cblas: 0.035438 (0.761 GFlops/s)
> eigen : 0.0182271 (1.481 GFlops/s)
> eigen : 0.0860961 (0.313 GFlops/s)
> (on a PIV, which may not be very representative of current architectures)
>> It is true that with Eigen this is set up at build time, but this is
>> only because it is squarely _not_ Eigen's job to do runtime platform
>> checks. Eigen isn't a binary library. If you want a runtime platform
>> switch, just compile your critical Eigen code twice, one with SSE one
>> without, and do the platform check in your own app. The good thing
>> here is that Eigen makes sure that the ABI is independent of whether
>> vectorization is enabled.
> I understand that it is not a goal of eigen, and that should be the
> application's job. It is just that MKL does it automatically, and doing
> it in a cross platform way in the context of python extensions is quite
> hard because of various linking strategies on different OS.

Yes, I understand that. MKL is not only a math library, it comes with
embedded threading library and hardware detection routines.


More information about the NumPy-Discussion mailing list