[Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

Julian Taylor jtaylor.debian at googlemail.com
Wed Sep 4 13:38:58 EDT 2013

On 04.09.2013 12:05, Graeme B. Bell wrote:
> In my current GIS raster work I often have a situation where I generate code something like this:
>          np.any([A>4, A==2, B==5, ...]) 
> However, np.any() is quite slow.
> It's possible to use np.logical_or to solve the problem, but then you get nested logical_or's, since logical_or combines only two parameters.
> It's also possible to use integer maths e.g. (A>4)+(A==2)+(B==5)>0.
> The question is: which is best (syntactically, in terms of performance, etc)?
> I've written a little helper function to provide a faster version of any() and all(). It's embarrassingly simple - just a for loop. However, I think there's a syntactic advantage to using a helper function for this situation rather than writing it idiomatically each time; and it reduces the chance of a bug in idiomatic implementation. However, the code does not cover all the use cases currently addressed by np.any() and np.all(). 
> I benchmarked to pick the fastest underlying implementation (logical_or rather than integer maths). 
> The result is 14 to 17x faster than np.any() for this use case.*

any/all and boolean operations have been significantly speed up by
vectorization in numpy 1.8 [0].
They are now around 10 times faster than before, especially if the
boolean array fits into one of the cpu caching layers.
If they don't I recommend using a blocking utility function, something like:
for i in range(0, n, blocksize):
   view = d[i:i+blocksize]
   #dostuff on view

with this method and the new vectorizations in numpy you are almost as
fast as numexpr for floats and probably a lot faster with bools.

(the dip before 1.7 was part of the NA branch and never released, 1.8
adds some of its optimizations back)

More information about the NumPy-Discussion mailing list