[Numpy-discussion] Improving performance of the `numpy.any` function.

Wed Apr 14 11:49:51 EDT 2021

Hi All,

I was using numpy's `any` function earlier and realized that it might not be
as performant as I assumed. See the code below:
```
In [1]: import numpy as np

In [2]: a = np.zeros(1_000_000)

In [3]: a[100] = 1

In [4]: b = np.zeros(2_000_000)

In [5]: b[100] = 1

In [6]: %timeit np.any(a)
1.33 ms ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: %timeit np.any(b)
2.65 ms ± 71.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8] %timeit any(a)
13.4 µs ± 354 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [9]: %timeit any(b)
13.3 µs ± 219 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```
It looks like `np.any` does not terminate early as soon as it finds a
non-zero element, compared to the built-in any function. So this prompted me
to go look at the source code for `PyArray_Any`:
https://github.com/numpy/numpy/blob/623bc1fae1d47df24e7f1e29321d0c0ba2771ce0/numpy/core/src/multiarray/calculation.c#L790
.

Taking a peek at the `PyArray_GenericReduceFunction` it looks like it just
calls the `reduce` function from python on every single element of the array
and there is no mechanism in place to short-circuit the call if an element
has been found early.

It seems worthwhile to have a specific function that allows early stopping
of the reduction? So my question is: Is there a reason why maintainers did
not implement such a feature? I would also like to know if everyone thinks
its worthwhile to implement one? Im guessing there are many unsuspecting
users like myself who call the `np.any` function on arrays assuming
early-stopping is supported. I think existing code that calls the function
would benefit a lot. Let me know what you think.

Regards

--
Sent from: http://numpy-discussion.10968.n7.nabble.com/