On Wed, Nov 17, 2021 at 7:49 AM Matti Picus <matti.picus@gmail.com> wrote:

On 17/11/21 7:57 am, Ralf Gommers wrote:
> Hi all,
>
> ...
>
> At this point it looks like controlling the number of threads that
> OpenBLAS uses is the way we can work around this problem (or let users
> do so). Ways to control threading:
>
> - Use `threadpoolctl` (see the README at
> https://github.com/joblib/threadpoolctl
> <https://github.com/joblib/threadpoolctl> for how)
> - Set an environment variable to control the behavior, e.g.
> `OPENBLAS_NUM_THREADS`
> - Rebuild the `libopenblas` we bundle in the wheel to have a max
> number of threads of 1, 2, or 4.
> ...
>
> Thoughts on which option seems best? Any other options I missed?
>
> Cheers,
> Ralf
>
There are openblas-specific utility functions like
`openblas_set_num_threads` [0]. They lack documentation about which
routines they affect but it might be an avenue to explore. Perhaps
openblas_get_num_threads/openblas_set_num_threads could be used around
the offending call like a context manager? Disadvantages:

- This would affect global state.

- It is not clear how to pull these functions into scipy. We tried
fishing them out in CI via ctypes to check the openblas version, and
failed on windows. Perhaps with a #ifdef OPENBLAS somewhere in C code?

Good point (and thanks Stefan for making the same point at the same time), I think we can. We could do this only in this one arm64 wheel (put the code in _distributor_init.py), and use the code from threadpoolctl, something like:
```
_dynlib = ctypes.CDLL(filepath, mode=_RTLD_NOLOAD)
set_func = getattr(
    _dynlib,
    "openblas_set_num_threads",
    # Symbols differ when built for 64bit integers in Fortran
    getattr(_dynlib, "openblas_set_num_threads64_", lambda num_threads: None),
)
set_func(num_threads)
```

We can't get our hands on the NumPy-vendored OpenBLAS that way though (there's no guarantee it even has OpenBLAS), so it's not as comprehensive a fix as either using threadpoolctl or the user setting an env var.

Cheers,
Ralf