[Numpy-discussion] Use OpenBLAS for the binary releases?
cournape at gmail.com
Tue Nov 20 09:38:27 EST 2012
On Mon, Nov 19, 2012 at 5:42 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 11/19/2012 06:12 PM, Sturla Molden wrote:
>> I think NumPy and SciPy should consider to use OpenBLAS (a fork of
>> GotoBLAS2) instead of ATLAS or f2c'd Netlib BLAS for the binary releases.
>> Here are its virtues:
>> * Very easy to build: Just a makefile, no configuration script or
>> special build tools.
>> * Building ATLAS can be a PITA. So why bother?
>> * Faster than ATLAS, sometimes faster than MKL.
>> * Multithreaded BLAS kernels: OpenMP on Unix, Windows threads on Windows.
>> * The quality of its ancestor GotoBLAS is undisputed. I was the BLAS
>> implementation of choice for major HPC projects around the World.
>> * Free as in BSD licensed.
>> * Funded and developed for use in major Chinese HPC projects. Actively
>> maintained. (GotoBLAS2 is abandonware.)
>> * Open source. The C sources are a pleasure to read, and very easy to
>> * No OpenMP on Windows means no dependency on pthreads-win32 (an LGPL
>> library) when building with MinGW.
>> * Builds on Windows with MinGW and MSYS, and perhaps even without MSYS.
>> * Cygwin is not needed on Windows (this is just BS from the GotoBLAS
>> documentation). Thus, 64-buit builds are possible (I've built it using
>> TDM-GCC for Win64 and 32-bit MSYS).
> Even on CPUs that are not directly supported, this is at least better
> than reference BLAS.
> (On our AMD CPUs, which are too new to have a separate OpenBLAS
> implementation, the implementations for older AMD CPUs still outperform
> at least Intel MKL, because MKL does so poorly on these -- although ACML
> beats them both by a factor 2. And of course on supported CPUs
> (everything Intel and older AMD) OpenBLAS is wonderful.
I support this as well in principle for our binary release: one issue
is that we don't have the infrastructure on mac to build an installer
with multi-arch support, and we can't assume every mac out there has
SSE 3 or 4 available.
We would need more testing first, as this is not a change to make lightly.
More information about the NumPy-Discussion