[Numpy-discussion] Numpy and MKL, update
Michael Abshoff
michael.abshoff at googlemail.com
Thu Nov 13 23:37:05 EST 2008
David Cournapeau wrote:
> On Fri, Nov 14, 2008 at 11:07 AM, Michael Abshoff
> <michael.abshoff at googlemail.com> wrote:
>> David Cournapeau wrote:
>>> On Fri, Nov 14, 2008 at 5:23 AM, frank wang <f.yw at hotmail.com> wrote:
>>>> Hi,
>> Hi,
>>
>>>> Can you provide a working example to build Numpy with MKL in window and
>>>> linux?
>>>> The reason I am thinking to build the system is that I need to make the
>>>> speed match with matlab.
>>> The MKL will only help you for linear algebra, and more specifically
>>> for big matrices. If you build your own atlas, you can easily match
>>> matlab speed in that area, I think.
>> That is pretty much true in my experience for anything but Core2 Intel
>> CPUs where GotoBLAS and the latest MKL have about a 25% advantage for
>> large problems.
>
> Note that I never said that ATLAS was faster than MKL/GotoBLAS :)
:)
> I said you could match matlab performances (which itself, up to 6.* at
> least, used ATLAS; you could increase matlab performances by using
> your own ATLAS BTW).
Yes, back in the day I got a three fold speedup for a certain workload
in Matlab by replacing BLAS and UMFPACK libraries.
> I don't think 25 % matter that much, because if
> it does, then you should not use python anyway in many cases (depends
> on the kind of problems of course, but I don't think most scientific
> problems reduce to just matrix product/inversion).
Sure, I agree here. But 25% performance for dgemm is significant for
some workloads, but if you spend the vast majority of time in Python
code it won't matter. And some times it is way more than that - see my
remarks below.
>> The advantage of the MKL is that one library works more or less optimal
>> on all platforms, i.e. with and without SSE2 for example since the
>> "right" routines are selected at run time.
>
> Agreed. As a numpy/scipy developer, I would actually be much more
> interested in work into that direction for ATLAS than trying to get a
> few % of peak speed.
Note that selecting non SSE2 versions of ATLAS can cause a significant
slowdown, i.e. one day not too long ago Ondrej Certik and I were sitting
in IRC in #sage-devel benchmarking some things. His Debian install was a
factor of 12 slower than the same software that he had build with Sage
and in the end it boiled down to non-SSE2 ATLAS vs. SSE2 ATLAS. That is
a freak case, but I am sure more than enough people will get bitten by
that issue since they installed "ATLAS" in Debian, but did not know
about SSE2 ATLAS.
And a while back someone compared various numerical closed and open
source projects in an article for some reknown Linux magazine, among
them Sage. So they run a bunch of numerical benchmarks, namely FFT and
SVD and Sage via numpy blew Matlab away by a factor of three for the SVD
(The FFT looked not so good because Sage is still using GSL for FFT, but
we will change that). Obviously that was not because numpy was clever
about the SVD used (I know there are several version in Lapack, but the
performance difference is usually small), but because Matlab used some
generic version of BLAS (it was unclear form the article if it was MKL
or ATLAS) and Sage used a custom build SSE2 version. The reviewer
expressed admiration for numpy and its clever SVD implementation - Sigh.
> Deployment of ATLAS is really difficult ATM, and
> it means that practically, we lose a lot of performances because for
> distribution, you can't tune for every CPU out there, so we just use
> safe defaults. Same for linux distributions. It is a shame that Apple
> did not open source their Accelerate framework (based on ATLAS, at
> least for the BLAS/LAPACK part), because that's exactly what they did.
Yes, Clint has been in contact with Apple, but never got anything out of
them. Too bad. The new ATLAS release should fix some build issues
regarding the dreaded timing tolerance issue and also will work much
better with threads since Clint rewrote the threading module so that the
memory allocation is no longer the bottle neck. He also added native
threading support for Windows, but that is not being tested yet, so
hopefully it will work in a future version. The main issue here is that
for assembly support Clint relies on gcc which is hardcoded into the
Makefiles, so we discussed various options how that can be avoided, but
so far nor progress can be reported.
> David
Cheers,
Michael
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list