[Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

Graeme B. Bell grb at skogoglandskap.no
Wed Sep 4 08:14:51 EDT 2013


Sorry, I should have been more clear.

As shown in the benchmark/example, the method is replacing the behaviour of 

   np.any(inputs, 0)

not the behaviour of

   np.any(inputs)

Here, where I'm making decisions based on overlaying layers of raster data in the same shape, I don't want to map the entire dataset to a single boolean, rather I want to preserve the layers' shape but identify if a condition was matched in any of the overlaid layers, generating a mask. 

For example, this type of reasoning: 

def mask(): 
for all pixel locations in the images, A, B and C: 
  if A[location] is 3, 19, or between 21 and 30  AND B[location] is any value AND C[location] is 1-4, 9-13... 
  pixel=True

This naturally fits the any/all metaphor.

Will update the description on github. 

Graeme. 

On Sep 4, 2013, at 12:05 PM, Graeme Bell <grb at skogoglandskap.no> wrote:

> In my current GIS raster work I often have a situation where I generate code something like this:
> 
>         np.any([A>4, A==2, B==5, ...]) 
> 
> However, np.any() is quite slow.
> 
> It's possible to use np.logical_or to solve the problem, but then you get nested logical_or's, since logical_or combines only two parameters.
> It's also possible to use integer maths e.g. (A>4)+(A==2)+(B==5)>0.
> 
> The question is: which is best (syntactically, in terms of performance, etc)?
> 
> I've written a little helper function to provide a faster version of any() and all(). It's embarrassingly simple - just a for loop. However, I think there's a syntactic advantage to using a helper function for this situation rather than writing it idiomatically each time; and it reduces the chance of a bug in idiomatic implementation. However, the code does not cover all the use cases currently addressed by np.any() and np.all(). 
> 
> I benchmarked to pick the fastest underlying implementation (logical_or rather than integer maths). 
> 
> The result is 14 to 17x faster than np.any() for this use case.*
> 
> Code & benchmark here:
> 
>      https://github.com/gbb/numpy-fast-any-all
> 
> Please feel welcome to use it or improve it :-)
> 
> Graeme.
> 
> 
> * (Should this become an execution path in np.any()... ?)




More information about the NumPy-Discussion mailing list