[Numpy-discussion] performance matrix multiplication vs. matlab
david at ar.media.kyoto-u.ac.jp
Tue Jun 9 22:33:26 EDT 2009
Benoit Jacob wrote:
> No, because _we_ are serious about compilation times, unlike other c++
> template libraries. But granted, compilation times are not as short as
> a plain C library either.
I concede it is not as bad as the heavily templated libraries in boost.
But C++ is just horribly slow to compile, at least with g++ - in scipy,
half of the compilation time is spent for a couple of C++ files which
uses simple templates. And the compiler takes a lot of memory during
compilation (~ 300 Mb per file - that's a problem because I rely a lot
on VM to build numpy/scipy binaries).
> Eigen doesn't _require_ any SIMD instruction set although it can use
> SSE / AltiVec if enabled.
If SSE is not enabled, my (very limited) tests show that eigen does not
perform as well as a stock debian ATLAS on the benchmarks given by
eigen. For example:
g++ benchBlasGemm.cpp -I .. -lblas -O2 -DNDEBUG && ./a.out 300
cblas: 0.034222 (0.788 GFlops/s)
eigen : 0.0863581 (0.312 GFlops/s)
eigen : 0.121259 (0.222 GFlops/s)
g++ benchBlasGemm.cpp -I .. -lblas -O2 -DNDEBUG -msse2 && ./a.out 300
cblas: 0.035438 (0.761 GFlops/s)
eigen : 0.0182271 (1.481 GFlops/s)
eigen : 0.0860961 (0.313 GFlops/s)
(on a PIV, which may not be very representative of current architectures)
> It is true that with Eigen this is set up at build time, but this is
> only because it is squarely _not_ Eigen's job to do runtime platform
> checks. Eigen isn't a binary library. If you want a runtime platform
> switch, just compile your critical Eigen code twice, one with SSE one
> without, and do the platform check in your own app. The good thing
> here is that Eigen makes sure that the ABI is independent of whether
> vectorization is enabled.
I understand that it is not a goal of eigen, and that should be the
application's job. It is just that MKL does it automatically, and doing
it in a cross platform way in the context of python extensions is quite
hard because of various linking strategies on different OS.
More information about the NumPy-Discussion