[Numpy-discussion] Short-circuiting equivalent of np.any or np.all?

Sebastian Berg sebastian at sipsolutions.net
Thu Apr 26 13:26:53 EDT 2018


On Thu, 2018-04-26 at 09:51 -0700, Hameer Abbasi wrote:
> Hi Nathan,
> 
> np.any and np.all call np.or.reduce and np.and.reduce respectively,
> and unfortunately the underlying function (ufunc.reduce) has no way
> of detecting that the value isn’t going to change anymore. It’s also
> used for (for example) np.sum (np.add.reduce), np.prod
> (np.multiply.reduce), np.min(np.minimum.reduce),
> np.max(np.maximum.reduce).


I would like to point out that this is not almost, but not quite true.
The boolean versions will short circuit on the innermost level, which
is good enough for all practical purposes probably.

One way to get around it would be to use a chunked iteration using
np.nditer in pure python. I admit it is a bit tricky to get start on,
but it is basically what numexpr uses also (at least in the simplest
mode), and if your arrays are relatively large, there is likely no real
performance hit compared to a non-pure python version.

- Sebastian



> 
> You can find more information about this on the ufunc doc page. I
> don’t think it’s worth it to break this machinery for any and all, as
> it has numerous other advantages (such as being able to override in
> duck arrays, etc)
> 
> Best regards,
> Hameer Abbasi
> Sent from Astro for Mac
> 
> > On Apr 26, 2018 at 18:45, Nathan Goldbaum <nathan12343 at gmail.com>
> > wrote:
> > 
> > Hi all,
> > 
> > I was surprised recently to discover that both np.any and np.all()
> > do not have a way to exit early:
> > 
> > In [1]: import numpy as np
> > 
> > In [2]: data = np.arange(1e6)
> > 
> > In [3]: print(data[:10])
> > [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
> > 
> > In [4]: %timeit np.any(data)
> > 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops
> > each)
> > 
> > In [5]: data = np.zeros(int(1e6))
> > 
> > In [6]: %timeit np.any(data)
> > 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops
> > each)
> > 
> > I don't see any discussions about this on the NumPy issue tracker
> > but perhaps I'm missing something.
> > 
> > I'm curious if there's a way to get a fast early-terminating search
> > in NumPy? Perhaps there's another package I can depend on that does
> > this? I guess I could also write a bit of cython code that does
> > this but so far this project is pure python and I don't want to
> > deal with the packaging headache of getting wheels built and conda-
> > forge packages set up on all platforms.
> > 
> > Thanks for your help!
> > 
> > -Nathan
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180426/f5c9d192/attachment.sig>


More information about the NumPy-Discussion mailing list