On Thu, Mar 23, 2023 at 1:55 AM Clemens Brunner <clemens.brunner@gmail.com> wrote:
Hello!

I recently got a new MacBook Pro with an M2 Pro CPU (ARM64). When I ran some numerical computations (ICA to be precise), I was surprised how slow it was - way slower than e.g. my almost 10 year old Intel Mac. It turns out that the default OpenBLAS, which is what you get when installing the binary wheel with pip (i.e. "pip install numpy"), is the reason why computations are so slow.

When installing NumPy from source (by using "pip install --no-binary :all: --no-use-pep517 numpy"), it uses the Apple-provided Accelerate framework, which includes an optimized BLAS library. The difference is mind-boggling, I'd even say that NumPy is pretty much unusable with the default OpenBLAS backend (at least for the applications I tested).

In my test with four different ICA algorithms, I got these runtimes with the default OpenBLAS:

- FastICA: 6.3s
- Picard: 26.3s
- Infomax: 0.8s
- Extended Infomax: 1.4s

Especially the second algorithm is way slower than on my 10 year old Intel Mac using OpenBLAS.

Here are the times with Accelerate:

- FastICA: 0.4s
- Picard: 0.6s
- Infomax: 1.0s
- Extended Infomax: 1.3s

Given this huge performance difference, my question is if you would consider distributing a binary wheel for ARM64-based Macs which links to Accelerate. Or are there any caveats why you do not want to do that? I know that NumPy moved away from Accelerate years ago on Intel Macs, but maybe now is the time to reconsider this decision.

Hi Clemens, thanks for the suggestion and benchmarks. We actually discussed this in the last community meeting. Accelerate as of today is supported when building from source, and that will use 32-bit BLAS/LAPACK (the LP64 interface). Since NumPy 1.22 we're shipping our wheels with the 64-bit (ILP64) interface, which Accelerate doesn't provide. That's about to change though, in macOS 13.3: https://developer.apple.com/documentation/macos-release-notes/macos-13_3-release-notes#Accelerate. That release will also upgrade to LAPACK 3.9.1, which means we can re-enable it for SciPy too.

For macOS 14 we will most likely, if things go well, ship wheels linked against the new ILP64 Accelerate build. Due to packaging limitations (the `packaging` library ignores minor versions of macOS), we can't ship wheels for >=13.3, only >=14.

Cheers,
Ralf