Hello,
this is a very basic question, but I cannot find a satisfying answer. Assume a is a 2D array and that I get the index of the maximum value along the second dimension:
i = a.argmax(axis=1)
Is there a better way to get the value of the maximum array entries along the second axis other than:
v = a[np.arange(len(a)), i]
??
Thank you.
Cheers, Daniele
max(axis=1)?
On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi daniele@grinta.net wrote:
Hello,
this is a very basic question, but I cannot find a satisfying answer. Assume a is a 2D array and that I get the index of the maximum value along the second dimension:
i = a.argmax(axis=1)
Is there a better way to get the value of the maximum array entries along the second axis other than:
v = a[np.arange(len(a)), i]
??
Thank you.
Cheers, Daniele _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 30/10/2019 19:10, Neal Becker wrote:
max(axis=1)?
Hi Neal,
I should have been more precise in stating the problem. Getting the values in the array for which I'm looking at the maxima is only one step in a more complex piece of code for which I need the indexes along the second axis of the array. I would like to avoid to have to iterate the array more than once.
Thank you!
Cheers, Dan
On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi <daniele@grinta.net mailto:daniele@grinta.net> wrote:
Hello, this is a very basic question, but I cannot find a satisfying answer. Assume a is a 2D array and that I get the index of the maximum value along the second dimension: i = a.argmax(axis=1) Is there a better way to get the value of the maximum array entries along the second axis other than: v = a[np.arange(len(a)), i] ?? Thank you. Cheers, Daniele _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
I wouldn't be surprised at all if calling max in addition to argmax wasn't as fast or faster than indexing the array using argmax. Regardless, just use that then profile when you're done with the whole thing and see if there's any gains to be made. Very likely not here.
-elliot
On Wed, Oct 30, 2019, 10:32 PM Daniele Nicolodi daniele@grinta.net wrote:
On 30/10/2019 19:10, Neal Becker wrote:
max(axis=1)?
Hi Neal,
I should have been more precise in stating the problem. Getting the values in the array for which I'm looking at the maxima is only one step in a more complex piece of code for which I need the indexes along the second axis of the array. I would like to avoid to have to iterate the array more than once.
Thank you!
Cheers, Dan
On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi <daniele@grinta.net mailto:daniele@grinta.net> wrote:
Hello, this is a very basic question, but I cannot find a satisfying answer. Assume a is a 2D array and that I get the index of the maximum value along the second dimension: i = a.argmax(axis=1) Is there a better way to get the value of the maximum array entries along the second axis other than: v = a[np.arange(len(a)), i] ?? Thank you. Cheers, Daniele _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 30/10/2019 22:42, Elliot Hallmark wrote:
I wouldn't be surprised at all if calling max in addition to argmax wasn't as fast or faster than indexing the array using argmax. Regardless, just use that then profile when you're done with the whole thing and see if there's any gains to be made. Very likely not here.
Hi Elliot,
how do you arrive at this conclusion? np.argmax() and np.max() are O(N) while indexing is O(1) thus I don't see how you can conclude that running both np.argmax() and np.max() on the input array is going to incur in a small penalty compared to running np.argmax() and then indexing.
Cheers, Dan
-elliot
On Wed, Oct 30, 2019, 10:32 PM Daniele Nicolodi <daniele@grinta.net mailto:daniele@grinta.net> wrote:
On 30/10/2019 19:10, Neal Becker wrote: > max(axis=1)? Hi Neal, I should have been more precise in stating the problem. Getting the values in the array for which I'm looking at the maxima is only one step in a more complex piece of code for which I need the indexes along the second axis of the array. I would like to avoid to have to iterate the array more than once. Thank you! Cheers, Dan > On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi <daniele@grinta.net <mailto:daniele@grinta.net> > <mailto:daniele@grinta.net <mailto:daniele@grinta.net>>> wrote: > > Hello, > > this is a very basic question, but I cannot find a satisfying answer. > Assume a is a 2D array and that I get the index of the maximum value > along the second dimension: > > i = a.argmax(axis=1) > > Is there a better way to get the value of the maximum array entries > along the second axis other than: > > v = a[np.arange(len(a)), i] > > ?? > > Thank you. > > Cheers, > Daniele > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> <mailto:NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>> > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Depends on how big your array is. Numpy C code is 150x+ faster than python overhead. Fancy indexing can be expensive in my experience. Without trying I'd guess arr[:, argmax(arr, axis=1)] does what you want, but even if it is, try profiling the two and see. I highly doubt such would be even 1% of your run time, but it depends on what your doing. Part of python with numpy is slightly not caring about big O because trying to be clever is rarely worth it in my experience.
On Thu, Oct 31, 2019 at 12:35 AM Daniele Nicolodi daniele@grinta.net wrote:
On 30/10/2019 22:42, Elliot Hallmark wrote:
I wouldn't be surprised at all if calling max in addition to argmax wasn't as fast or faster than indexing the array using argmax. Regardless, just use that then profile when you're done with the whole thing and see if there's any gains to be made. Very likely not
here.
Hi Elliot,
how do you arrive at this conclusion? np.argmax() and np.max() are O(N) while indexing is O(1) thus I don't see how you can conclude that running both np.argmax() and np.max() on the input array is going to incur in a small penalty compared to running np.argmax() and then indexing.
Cheers, Dan
-elliot
On Wed, Oct 30, 2019, 10:32 PM Daniele Nicolodi <daniele@grinta.net mailto:daniele@grinta.net> wrote:
On 30/10/2019 19:10, Neal Becker wrote: > max(axis=1)? Hi Neal, I should have been more precise in stating the problem. Getting the values in the array for which I'm looking at the maxima is only one
step
in a more complex piece of code for which I need the indexes along
the
second axis of the array. I would like to avoid to have to iterate
the
array more than once. Thank you! Cheers, Dan > On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi <daniele@grinta.net <mailto:daniele@grinta.net> > <mailto:daniele@grinta.net <mailto:daniele@grinta.net>>> wrote: > > Hello, > > this is a very basic question, but I cannot find a satisfying answer. > Assume a is a 2D array and that I get the index of the maximum value > along the second dimension: > > i = a.argmax(axis=1) > > Is there a better way to get the value of the maximum array entries > along the second axis other than: > > v = a[np.arange(len(a)), i] > > ?? > > Thank you. > > Cheers, > Daniele > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> <mailto:NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>> > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 31-10-2019 01:44, Elliot Hallmark wrote:
Depends on how big your array is. Numpy C code is 150x+ faster than python overhead. Fancy indexing can be expensive in my experience. Without trying I'd guess arr[:, argmax(arr, axis=1)] does what you want,
It does not.
but even if it is, try profiling the two and see. I highly doubt such would be even 1% of your run time, but it depends on what your doing. Part of python with numpy is slightly not caring about big O because trying to be clever is rarely worth it in my experience.
Why do you think I am asking for advice on how to do the complicated thing? If a 2x increase in the run time would have not mattered, I would not have bothered. Don't you think?
I appreciate the effort spent guiding inexperienced users toward pragmatic solutions and not over complicate their code. However, it is disappointing to have very precise questions dismissed as "that is complicated, thus you don't really want to do it".
Best, Dan
On Thu, Oct 31, 2019 at 12:35 AM Daniele Nicolodi <daniele@grinta.net mailto:daniele@grinta.net> wrote:
On 30/10/2019 22:42, Elliot Hallmark wrote: > I wouldn't be surprised at all if calling max in addition to argmax > wasn't as fast or faster than indexing the array using argmax. > Regardless, just use that then profile when you're done with the > whole thing and see if there's any gains to be made. Very likely not here. Hi Elliot, how do you arrive at this conclusion? np.argmax() and np.max() are O(N) while indexing is O(1) thus I don't see how you can conclude that running both np.argmax() and np.max() on the input array is going to incur in a small penalty compared to running np.argmax() and then indexing. Cheers, Dan > > -elliot > > On Wed, Oct 30, 2019, 10:32 PM Daniele Nicolodi <daniele@grinta.net <mailto:daniele@grinta.net> > <mailto:daniele@grinta.net <mailto:daniele@grinta.net>>> wrote: > > On 30/10/2019 19:10, Neal Becker wrote: > > max(axis=1)? > > Hi Neal, > > I should have been more precise in stating the problem. Getting the > values in the array for which I'm looking at the maxima is only one step > in a more complex piece of code for which I need the indexes along the > second axis of the array. I would like to avoid to have to iterate the > array more than once. > > Thank you! > > Cheers, > Dan > > > > On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi <daniele@grinta.net <mailto:daniele@grinta.net> > <mailto:daniele@grinta.net <mailto:daniele@grinta.net>> > > <mailto:daniele@grinta.net <mailto:daniele@grinta.net> <mailto:daniele@grinta.net <mailto:daniele@grinta.net>>>> wrote: > > > > Hello, > > > > this is a very basic question, but I cannot find a satisfying > answer. > > Assume a is a 2D array and that I get the index of the maximum > value > > along the second dimension: > > > > i = a.argmax(axis=1) > > > > Is there a better way to get the value of the maximum array > entries > > along the second axis other than: > > > > v = a[np.arange(len(a)), i] > > > > ?? > > > > Thank you. > > > > Cheers, > > Daniele > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> > <mailto:NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>> > <mailto:NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> > <mailto:NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>>> > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> <mailto:NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>> > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> <mailto:NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>> > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
my thought was to try `take` or `take_along_axis`:
ind = np.argmin(a, axis=1) np.take_along_axis(a, ind[:,None], axis=1)
But those functions tend to simply fall back to fancy indexing, and are pretty slow. On my system plain fancy indexing is fastest:
%timeit a[np.arange(N),ind]
1.58 µs ± 18.1 ns per loop
%timeit np.take_along_axis(a, ind[:,None], axis=1)
6.49 µs ± 57.3 ns per loop
%timeit np.min(a, axis=1)
9.51 µs ± 64.1 ns per loop
Probably `take_along_axis` was designed with uses like yours in mind, but it is not very optimized.
(I think numpy is lacking a category of efficient indexing/search/reduction functions, like 'findfirst', 'groupby', short-circuiting any_*/all_*/nonzero, the proposed oindex/vindex, better gufunc broadcasting. There is slow but gradual infrastructure work towards these, potentially).
Cheers, Allan
On 10/30/19 11:31 PM, Daniele Nicolodi wrote:
On 30/10/2019 19:10, Neal Becker wrote:
max(axis=1)?
Hi Neal,
I should have been more precise in stating the problem. Getting the values in the array for which I'm looking at the maxima is only one step in a more complex piece of code for which I need the indexes along the second axis of the array. I would like to avoid to have to iterate the array more than once.
Thank you!
Cheers, Dan
On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi <daniele@grinta.net mailto:daniele@grinta.net> wrote:
Hello, this is a very basic question, but I cannot find a satisfying answer. Assume a is a 2D array and that I get the index of the maximum value along the second dimension: i = a.argmax(axis=1) Is there a better way to get the value of the maximum array entries along the second axis other than: v = a[np.arange(len(a)), i] ?? Thank you. Cheers, Daniele _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On my system plain fancy indexing is fastest
Hardly surprising, since take_along_axis is doing that under the hood, after constructing the index for you :)
https://github.com/numpy/numpy/blob/v1.17.0/numpy/lib/shape_base.py#L58-L172
I deliberately didn't expose the internal function that constructs the slice, since leaving it private frees us to move those functions to c or in the distant future gufuncs.
On Fri, Nov 1, 2019, 15:54 Allan Haldane allanhaldane@gmail.com wrote:
my thought was to try `take` or `take_along_axis`:
ind = np.argmin(a, axis=1) np.take_along_axis(a, ind[:,None], axis=1)
But those functions tend to simply fall back to fancy indexing, and are pretty slow. On my system plain fancy indexing is fastest:
%timeit a[np.arange(N),ind]
1.58 µs ± 18.1 ns per loop
%timeit np.take_along_axis(a, ind[:,None], axis=1)
6.49 µs ± 57.3 ns per loop
%timeit np.min(a, axis=1)
9.51 µs ± 64.1 ns per loop
Probably `take_along_axis` was designed with uses like yours in mind, but it is not very optimized.
(I think numpy is lacking a category of efficient indexing/search/reduction functions, like 'findfirst', 'groupby', short-circuiting any_*/all_*/nonzero, the proposed oindex/vindex, better gufunc broadcasting. There is slow but gradual infrastructure work towards these, potentially).
Cheers, Allan
On 10/30/19 11:31 PM, Daniele Nicolodi wrote:
On 30/10/2019 19:10, Neal Becker wrote:
max(axis=1)?
Hi Neal,
I should have been more precise in stating the problem. Getting the values in the array for which I'm looking at the maxima is only one step in a more complex piece of code for which I need the indexes along the second axis of the array. I would like to avoid to have to iterate the array more than once.
Thank you!
Cheers, Dan
On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi <daniele@grinta.net mailto:daniele@grinta.net> wrote:
Hello, this is a very basic question, but I cannot find a satisfying
answer.
Assume a is a 2D array and that I get the index of the maximum value along the second dimension: i = a.argmax(axis=1) Is there a better way to get the value of the maximum array entries along the second axis other than: v = a[np.arange(len(a)), i] ?? Thank you. Cheers, Daniele _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 01-11-2019 09:51, Allan Haldane wrote:
my thought was to try `take` or `take_along_axis`:
ind = np.argmin(a, axis=1) np.take_along_axis(a, ind[:,None], axis=1)
But those functions tend to simply fall back to fancy indexing, and are pretty slow. On my system plain fancy indexing is fastest:
%timeit a[np.arange(N),ind]
1.58 µs ± 18.1 ns per loop
%timeit np.take_along_axis(a, ind[:,None], axis=1)
6.49 µs ± 57.3 ns per loop
%timeit np.min(a, axis=1)
9.51 µs ± 64.1 ns per loop
Probably `take_along_axis` was designed with uses like yours in mind, but it is not very optimized.
Hi Allan,
after scanning the documentation once more I found `take_along_axis` and was hoping that it implements some smart trick that does not involve generating and indexing array, but apparently that is what it does.
Given the current numpy primitives, I don't see a way to optimize it further and keep it generic. I think the direct fancy indexing is faster in your case because of overhead in handling the generic case and not because of algorithmic inefficiency (from the run times you report it seems that your test array was fairly small).
Thank you.
Cheers, Dan
You could move some of the cost to index-creation time by converting the per-row indices into flattened indices:
In [1]: a = np.random.random((5, 6))
In [2]: i = a.argmax(axis=1)
In [3]: a[np.arange(len(a)), i]
Out[3]: array([0.95774465, 0.90940106, 0.98025448, 0.97836906, 0.80483784])
In [4]: f = np.ravel_multi_index((np.arange(len(a)), i), a.shape)
In [5]: a.flat[f]
Out[5]: array([0.95774465, 0.90940106, 0.98025448, 0.97836906, 0.80483784])
I haven't benchmarked, but I suspect this will be faster if you're using the same index multiple times.