[Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

Phil Elson pelson.pub at gmail.com
Wed Sep 4 11:00:40 EDT 2013


For the record, I started a discussion about 6 months ago about a
"find_first" type function which avoided running the logic over the whole
array (using lambdas instead). This spilled into a discussion about
implementing a short-cutted "any" or "all" function:
http://numpy-discussion.10968.n7.nabble.com/Implementing-a-find-first-style-function-tp33085.htmlwith
some interesting results.

Nothing more has been done with those discussions, but you may find it of
interest. (And I'd still be interested in taking it forwards if you have
any comments)

Cheers,



On 4 September 2013 13:14, Graeme B. Bell <grb at skogoglandskap.no> wrote:

> Sorry, I should have been more clear.
>
> As shown in the benchmark/example, the method is replacing the behaviour of
>
>    np.any(inputs, 0)
>
> not the behaviour of
>
>    np.any(inputs)
>
> Here, where I'm making decisions based on overlaying layers of raster data
> in the same shape, I don't want to map the entire dataset to a single
> boolean, rather I want to preserve the layers' shape but identify if a
> condition was matched in any of the overlaid layers, generating a mask.
>
> For example, this type of reasoning:
>
> def mask():
> for all pixel locations in the images, A, B and C:
>   if A[location] is 3, 19, or between 21 and 30  AND B[location] is any
> value AND C[location] is 1-4, 9-13...
>   pixel=True
>
> This naturally fits the any/all metaphor.
>
> Will update the description on github.
>
> Graeme.
>
> On Sep 4, 2013, at 12:05 PM, Graeme Bell <grb at skogoglandskap.no> wrote:
>
> > In my current GIS raster work I often have a situation where I generate
> code something like this:
> >
> >         np.any([A>4, A==2, B==5, ...])
> >
> > However, np.any() is quite slow.
> >
> > It's possible to use np.logical_or to solve the problem, but then you
> get nested logical_or's, since logical_or combines only two parameters.
> > It's also possible to use integer maths e.g. (A>4)+(A==2)+(B==5)>0.
> >
> > The question is: which is best (syntactically, in terms of performance,
> etc)?
> >
> > I've written a little helper function to provide a faster version of
> any() and all(). It's embarrassingly simple - just a for loop. However, I
> think there's a syntactic advantage to using a helper function for this
> situation rather than writing it idiomatically each time; and it reduces
> the chance of a bug in idiomatic implementation. However, the code does not
> cover all the use cases currently addressed by np.any() and np.all().
> >
> > I benchmarked to pick the fastest underlying implementation (logical_or
> rather than integer maths).
> >
> > The result is 14 to 17x faster than np.any() for this use case.*
> >
> > Code & benchmark here:
> >
> >      https://github.com/gbb/numpy-fast-any-all
> >
> > Please feel welcome to use it or improve it :-)
> >
> > Graeme.
> >
> >
> > * (Should this become an execution path in np.any()... ?)
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130904/41908283/attachment.html>


More information about the NumPy-Discussion mailing list