Hi all,
I was surprised recently to discover that both np.any and np.all() do not have a way to exit early:
In [1]: import numpy as np
In [2]: data = np.arange(1e6)
In [3]: print(data[:10]) [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
In [4]: %timeit np.any(data) 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
In [5]: data = np.zeros(int(1e6))
In [6]: %timeit np.any(data) 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
I don't see any discussions about this on the NumPy issue tracker but perhaps I'm missing something.
I'm curious if there's a way to get a fast early-terminating search in NumPy? Perhaps there's another package I can depend on that does this? I guess I could also write a bit of cython code that does this but so far this project is pure python and I don't want to deal with the packaging headache of getting wheels built and conda-forge packages set up on all platforms.
Thanks for your help!
-Nathan
Hi Nathan,
np.any and np.all call np.or.reduce and np.and.reduce respectively, and unfortunately the underlying function (ufunc.reduce) has no way of detecting that the value isn’t going to change anymore. It’s also used for (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), np.min(np.minimum.reduce), np.max(np.maximum.reduce).
You can find more information about this on the ufunc doc page https://docs.scipy.org/doc/numpy/reference/ufuncs.html. I don’t think it’s worth it to break this machinery for any and all, as it has numerous other advantages (such as being able to override in duck arrays, etc)
Best regards, Hameer Abbasi Sent from Astro https://www.helloastro.com for Mac
On Apr 26, 2018 at 18:45, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I was surprised recently to discover that both np.any and np.all() do not have a way to exit early:
In [1]: import numpy as np
In [2]: data = np.arange(1e6)
In [3]: print(data[:10]) [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
In [4]: %timeit np.any(data) 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
In [5]: data = np.zeros(int(1e6))
In [6]: %timeit np.any(data) 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
I don't see any discussions about this on the NumPy issue tracker but perhaps I'm missing something.
I'm curious if there's a way to get a fast early-terminating search in NumPy? Perhaps there's another package I can depend on that does this? I guess I could also write a bit of cython code that does this but so far this project is pure python and I don't want to deal with the packaging headache of getting wheels built and conda-forge packages set up on all platforms.
Thanks for your help!
-Nathan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, Apr 26, 2018 at 11:52 AM Hameer Abbasi einstein.edison@gmail.com wrote:
Hi Nathan,
np.any and np.all call np.or.reduce and np.and.reduce respectively, and unfortunately the underlying function (ufunc.reduce) has no way of detecting that the value isn’t going to change anymore. It’s also used for (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), np.min(np.minimum.reduce), np.max(np.maximum.reduce).
You can find more information about this on the ufunc doc page https://docs.scipy.org/doc/numpy/reference/ufuncs.html. I don’t think it’s worth it to break this machinery for any and all, as it has numerous other advantages (such as being able to override in duck arrays, etc)
Sure, I'm not saying that numpy should change, more trying to see if there's an alternate way to get what I want in NumPy or some other package.
Best regards, Hameer Abbasi Sent from Astro https://www.helloastro.com for Mac
On Apr 26, 2018 at 18:45, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I was surprised recently to discover that both np.any and np.all() do not have a way to exit early:
In [1]: import numpy as np
In [2]: data = np.arange(1e6)
In [3]: print(data[:10]) [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
In [4]: %timeit np.any(data) 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
In [5]: data = np.zeros(int(1e6))
In [6]: %timeit np.any(data) 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
I don't see any discussions about this on the NumPy issue tracker but perhaps I'm missing something.
I'm curious if there's a way to get a fast early-terminating search in NumPy? Perhaps there's another package I can depend on that does this? I guess I could also write a bit of cython code that does this but so far this project is pure python and I don't want to deal with the packaging headache of getting wheels built and conda-forge packages set up on all platforms.
Thanks for your help!
-Nathan
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Ah, in that case, if exotic platforms aren’t important for you, Numba can do the trick quite well.
Best regards, Hameer Abbasi Sent from Astro https://www.helloastro.com for Mac
On Apr 26, 2018 at 18:58, Nathan Goldbaum nathan12343@gmail.com wrote:
On Thu, Apr 26, 2018 at 11:52 AM Hameer Abbasi einstein.edison@gmail.com wrote:
Hi Nathan,
np.any and np.all call np.or.reduce and np.and.reduce respectively, and unfortunately the underlying function (ufunc.reduce) has no way of detecting that the value isn’t going to change anymore. It’s also used for (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), np.min(np.minimum.reduce), np.max(np.maximum.reduce).
You can find more information about this on the ufunc doc page https://docs.scipy.org/doc/numpy/reference/ufuncs.html. I don’t think it’s worth it to break this machinery for any and all, as it has numerous other advantages (such as being able to override in duck arrays, etc)
Sure, I'm not saying that numpy should change, more trying to see if there's an alternate way to get what I want in NumPy or some other package.
Best regards, Hameer Abbasi Sent from Astro https://www.helloastro.com for Mac
On Apr 26, 2018 at 18:45, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I was surprised recently to discover that both np.any and np.all() do not have a way to exit early:
In [1]: import numpy as np
In [2]: data = np.arange(1e6)
In [3]: print(data[:10]) [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
In [4]: %timeit np.any(data) 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
In [5]: data = np.zeros(int(1e6))
In [6]: %timeit np.any(data) 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
I don't see any discussions about this on the NumPy issue tracker but perhaps I'm missing something.
I'm curious if there's a way to get a fast early-terminating search in NumPy? Perhaps there's another package I can depend on that does this? I guess I could also write a bit of cython code that does this but so far this project is pure python and I don't want to deal with the packaging headache of getting wheels built and conda-forge packages set up on all platforms.
Thanks for your help!
-Nathan
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Would it be useful to have a short-circuited version of the function that is not a ufunc?
- Joe
On Thu, Apr 26, 2018 at 12:51 PM, Hameer Abbasi einstein.edison@gmail.com wrote:
Hi Nathan,
np.any and np.all call np.or.reduce and np.and.reduce respectively, and unfortunately the underlying function (ufunc.reduce) has no way of detecting that the value isn’t going to change anymore. It’s also used for (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), np.min(np.minimum.reduce), np.max(np.maximum.reduce).
You can find more information about this on the ufunc doc page https://docs.scipy.org/doc/numpy/reference/ufuncs.html. I don’t think it’s worth it to break this machinery for any and all, as it has numerous other advantages (such as being able to override in duck arrays, etc)
Best regards, Hameer Abbasi Sent from Astro https://www.helloastro.com for Mac
On Apr 26, 2018 at 18:45, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I was surprised recently to discover that both np.any and np.all() do not have a way to exit early:
In [1]: import numpy as np
In [2]: data = np.arange(1e6)
In [3]: print(data[:10]) [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
In [4]: %timeit np.any(data) 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
In [5]: data = np.zeros(int(1e6))
In [6]: %timeit np.any(data) 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
I don't see any discussions about this on the NumPy issue tracker but perhaps I'm missing something.
I'm curious if there's a way to get a fast early-terminating search in NumPy? Perhaps there's another package I can depend on that does this? I guess I could also write a bit of cython code that does this but so far this project is pure python and I don't want to deal with the packaging headache of getting wheels built and conda-forge packages set up on all platforms.
Thanks for your help!
-Nathan
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, Apr 26, 2018 at 12:03 PM Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:
Would it be useful to have a short-circuited version of the function that is not a ufunc?
Yes definitely. I could use numba as suggested by Hameer but I'd rather not add a new runtime dependency. I could use cython or C but I'd need to deal with the packaging headaches of including C code in your package.
I guess I could also create a new project that just implements the functions I need in cython, deal with the packaging headaches there, and then depend on that package. At least that way others won't need to deal with the pain :)
- Joe
On Thu, Apr 26, 2018 at 12:51 PM, Hameer Abbasi <einstein.edison@gmail.com
wrote:
Hi Nathan,
np.any and np.all call np.or.reduce and np.and.reduce respectively, and unfortunately the underlying function (ufunc.reduce) has no way of detecting that the value isn’t going to change anymore. It’s also used for (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), np.min(np.minimum.reduce), np.max(np.maximum.reduce).
You can find more information about this on the ufunc doc page https://docs.scipy.org/doc/numpy/reference/ufuncs.html. I don’t think it’s worth it to break this machinery for any and all, as it has numerous other advantages (such as being able to override in duck arrays, etc)
Best regards, Hameer Abbasi Sent from Astro https://www.helloastro.com for Mac
On Apr 26, 2018 at 18:45, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I was surprised recently to discover that both np.any and np.all() do not have a way to exit early:
In [1]: import numpy as np
In [2]: data = np.arange(1e6)
In [3]: print(data[:10]) [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
In [4]: %timeit np.any(data) 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
In [5]: data = np.zeros(int(1e6))
In [6]: %timeit np.any(data) 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
I don't see any discussions about this on the NumPy issue tracker but perhaps I'm missing something.
I'm curious if there's a way to get a fast early-terminating search in NumPy? Perhaps there's another package I can depend on that does this? I guess I could also write a bit of cython code that does this but so far this project is pure python and I don't want to deal with the packaging headache of getting wheels built and conda-forge packages set up on all platforms.
Thanks for your help!
-Nathan
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, 2018-04-26 at 09:51 -0700, Hameer Abbasi wrote:
Hi Nathan,
np.any and np.all call np.or.reduce and np.and.reduce respectively, and unfortunately the underlying function (ufunc.reduce) has no way of detecting that the value isn’t going to change anymore. It’s also used for (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), np.min(np.minimum.reduce), np.max(np.maximum.reduce).
I would like to point out that this is not almost, but not quite true. The boolean versions will short circuit on the innermost level, which is good enough for all practical purposes probably.
One way to get around it would be to use a chunked iteration using np.nditer in pure python. I admit it is a bit tricky to get start on, but it is basically what numexpr uses also (at least in the simplest mode), and if your arrays are relatively large, there is likely no real performance hit compared to a non-pure python version.
- Sebastian
You can find more information about this on the ufunc doc page. I don’t think it’s worth it to break this machinery for any and all, as it has numerous other advantages (such as being able to override in duck arrays, etc)
Best regards, Hameer Abbasi Sent from Astro for Mac
On Apr 26, 2018 at 18:45, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I was surprised recently to discover that both np.any and np.all() do not have a way to exit early:
In [1]: import numpy as np
In [2]: data = np.arange(1e6)
In [3]: print(data[:10]) [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
In [4]: %timeit np.any(data) 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
In [5]: data = np.zeros(int(1e6))
In [6]: %timeit np.any(data) 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
I don't see any discussions about this on the NumPy issue tracker but perhaps I'm missing something.
I'm curious if there's a way to get a fast early-terminating search in NumPy? Perhaps there's another package I can depend on that does this? I guess I could also write a bit of cython code that does this but so far this project is pure python and I don't want to deal with the packaging headache of getting wheels built and conda- forge packages set up on all platforms.
Thanks for your help!
-Nathan
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
For a lot more discussion, and a possible solution, see https://github.com/numpy/numpy/pull/8528
On Thu, 2018-04-26 at 19:26 +0200, Sebastian Berg wrote:
On Thu, 2018-04-26 at 09:51 -0700, Hameer Abbasi wrote:
Hi Nathan,
np.any and np.all call np.or.reduce and np.and.reduce respectively, and unfortunately the underlying function (ufunc.reduce) has no way of detecting that the value isn’t going to change anymore. It’s also used for (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), np.min(np.minimum.reduce), np.max(np.maximum.reduce).
I would like to point out that this is not almost, but not quite true. The boolean versions will short circuit on the innermost level, which is good enough for all practical purposes probably.
One way to get around it would be to use a chunked iteration using np.nditer in pure python. I admit it is a bit tricky to get start on, but it is basically what numexpr uses also (at least in the simplest mode), and if your arrays are relatively large, there is likely no real performance hit compared to a non-pure python version.
I mean something like this:
def check_any(arr, func=lambda x: x, buffersize=0): """ Check if the function is true for any value in arr and stop once the first was found.
Parameters ---------- arr : ndarray Array to test. func : function Function taking a 1D array as argument and returning an array (on which ``np.any`` will be called. buffersize : int Size of the chunk/buffer in the iteration, zero will use the default numpy value. Notes ----- The stopping does not occur immediatly but in buffersize chunks. """ iterflags = ['buffered', 'external_loop', 'refs_ok', 'zerosize_ok'] for chunk in np.nditer((arr,), flags=iterflags, buffersize=buffersize): if np.any(func(chunk)): return True
return False
not sure how it performs actually, but you can give it a try especially if you know you have large arrays, or if "func" is pretty expensive. If the input is already bool, it will be quite a bit slower though I am sure.
- Sebastian
- Sebastian
You can find more information about this on the ufunc doc page. I don’t think it’s worth it to break this machinery for any and all, as it has numerous other advantages (such as being able to override in duck arrays, etc)
Best regards, Hameer Abbasi Sent from Astro for Mac
On Apr 26, 2018 at 18:45, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I was surprised recently to discover that both np.any and np.all() do not have a way to exit early:
In [1]: import numpy as np
In [2]: data = np.arange(1e6)
In [3]: print(data[:10]) [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
In [4]: %timeit np.any(data) 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
In [5]: data = np.zeros(int(1e6))
In [6]: %timeit np.any(data) 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
I don't see any discussions about this on the NumPy issue tracker but perhaps I'm missing something.
I'm curious if there's a way to get a fast early-terminating search in NumPy? Perhaps there's another package I can depend on that does this? I guess I could also write a bit of cython code that does this but so far this project is pure python and I don't want to deal with the packaging headache of getting wheels built and conda- forge packages set up on all platforms.
Thanks for your help!
-Nathan
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 04/26/2018 12:45 PM, Nathan Goldbaum wrote:
I'm curious if there's a way to get a fast early-terminating search in NumPy? Perhaps there's another package I can depend on that does this? I guess I could also write a bit of cython code that does this but so far this project is pure python and I don't want to deal with the packaging headache of getting wheels built and conda-forge packages set up on all platforms.
Thanks for your help!
-Nathan
A current PR that implements short-circuiting for "all"-like operations is:
https://github.com/numpy/numpy/pull/8528
Actually, I have a little dream that we will be able to implement this kind of short-circuiting more generally in numpy soon, following the idea in that PR of turning functions into gufuncs. We just need to add some finishing touches on the gufunc implementation first.
We are almost there - the one important feature gufuncs are still missing is support for "multiple axis" arguments. See https://github.com/numpy/numpy/issues/8810.
Once that is done I also think there are some other new and useful short-circuiting gufuncs we could add, like "count" and "first". See some comments:
https://github.com/numpy/numpy/pull/8528#issuecomment-365358119
I am imagining we will end up with a "gufunc ecosystem", where there are some core ufuncs like np.add, np.multiply, np.less_than, and then a bunch of "associated" gufuncs for each of these, like reduce, first, all, accessible as attributes of the core ufunc.
(It has long been vaguely planned to turn "reduce" into a gufunc too, according to comments in the code. I'm excited for when that can happen!)
Allan