argmax() indexes to value
![](https://secure.gravatar.com/avatar/18d7c4503b713c388142c34a10e26082.jpg?s=120&d=mm&r=g)
Hello, this is a very basic question, but I cannot find a satisfying answer. Assume a is a 2D array and that I get the index of the maximum value along the second dimension: i = a.argmax(axis=1) Is there a better way to get the value of the maximum array entries along the second axis other than: v = a[np.arange(len(a)), i] ?? Thank you. Cheers, Daniele
![](https://secure.gravatar.com/avatar/18d7c4503b713c388142c34a10e26082.jpg?s=120&d=mm&r=g)
On 30/10/2019 19:10, Neal Becker wrote:
max(axis=1)?
Hi Neal, I should have been more precise in stating the problem. Getting the values in the array for which I'm looking at the maxima is only one step in a more complex piece of code for which I need the indexes along the second axis of the array. I would like to avoid to have to iterate the array more than once. Thank you! Cheers, Dan
![](https://secure.gravatar.com/avatar/d253a69e982f2c933f498541fd4748e0.jpg?s=120&d=mm&r=g)
I wouldn't be surprised at all if calling max in addition to argmax wasn't as fast or faster than indexing the array using argmax. Regardless, just use that then profile when you're done with the whole thing and see if there's any gains to be made. Very likely not here. -elliot On Wed, Oct 30, 2019, 10:32 PM Daniele Nicolodi <daniele@grinta.net> wrote:
![](https://secure.gravatar.com/avatar/18d7c4503b713c388142c34a10e26082.jpg?s=120&d=mm&r=g)
On 30/10/2019 22:42, Elliot Hallmark wrote:
Hi Elliot, how do you arrive at this conclusion? np.argmax() and np.max() are O(N) while indexing is O(1) thus I don't see how you can conclude that running both np.argmax() and np.max() on the input array is going to incur in a small penalty compared to running np.argmax() and then indexing. Cheers, Dan
![](https://secure.gravatar.com/avatar/d253a69e982f2c933f498541fd4748e0.jpg?s=120&d=mm&r=g)
Depends on how big your array is. Numpy C code is 150x+ faster than python overhead. Fancy indexing can be expensive in my experience. Without trying I'd guess arr[:, argmax(arr, axis=1)] does what you want, but even if it is, try profiling the two and see. I highly doubt such would be even 1% of your run time, but it depends on what your doing. Part of python with numpy is slightly not caring about big O because trying to be clever is rarely worth it in my experience. On Thu, Oct 31, 2019 at 12:35 AM Daniele Nicolodi <daniele@grinta.net> wrote:
![](https://secure.gravatar.com/avatar/18d7c4503b713c388142c34a10e26082.jpg?s=120&d=mm&r=g)
On 31-10-2019 01:44, Elliot Hallmark wrote:
It does not.
Why do you think I am asking for advice on how to do the complicated thing? If a 2x increase in the run time would have not mattered, I would not have bothered. Don't you think? I appreciate the effort spent guiding inexperienced users toward pragmatic solutions and not over complicate their code. However, it is disappointing to have very precise questions dismissed as "that is complicated, thus you don't really want to do it". Best, Dan
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
my thought was to try `take` or `take_along_axis`: ind = np.argmin(a, axis=1) np.take_along_axis(a, ind[:,None], axis=1) But those functions tend to simply fall back to fancy indexing, and are pretty slow. On my system plain fancy indexing is fastest:
Probably `take_along_axis` was designed with uses like yours in mind, but it is not very optimized. (I think numpy is lacking a category of efficient indexing/search/reduction functions, like 'findfirst', 'groupby', short-circuiting any_*/all_*/nonzero, the proposed oindex/vindex, better gufunc broadcasting. There is slow but gradual infrastructure work towards these, potentially). Cheers, Allan On 10/30/19 11:31 PM, Daniele Nicolodi wrote:
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
On my system plain fancy indexing is fastest
Hardly surprising, since take_along_axis is doing that under the hood, after constructing the index for you :) https://github.com/numpy/numpy/blob/v1.17.0/numpy/lib/shape_base.py#L58-L172 I deliberately didn't expose the internal function that constructs the slice, since leaving it private frees us to move those functions to c or in the distant future gufuncs. On Fri, Nov 1, 2019, 15:54 Allan Haldane <allanhaldane@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/18d7c4503b713c388142c34a10e26082.jpg?s=120&d=mm&r=g)
On 01-11-2019 09:51, Allan Haldane wrote:
Hi Allan, after scanning the documentation once more I found `take_along_axis` and was hoping that it implements some smart trick that does not involve generating and indexing array, but apparently that is what it does. Given the current numpy primitives, I don't see a way to optimize it further and keep it generic. I think the direct fancy indexing is faster in your case because of overhead in handling the generic case and not because of algorithmic inefficiency (from the run times you report it seems that your test array was fairly small). Thank you. Cheers, Dan
![](https://secure.gravatar.com/avatar/f9edf141570be78631565a8c8883bde5.jpg?s=120&d=mm&r=g)
You could move some of the cost to index-creation time by converting the per-row indices into flattened indices: In [1]: a = np.random.random((5, 6)) In [2]: i = a.argmax(axis=1) In [3]: a[np.arange(len(a)), i] Out[3]: array([0.95774465, 0.90940106, 0.98025448, 0.97836906, 0.80483784]) In [4]: f = np.ravel_multi_index((np.arange(len(a)), i), a.shape) In [5]: a.flat[f] Out[5]: array([0.95774465, 0.90940106, 0.98025448, 0.97836906, 0.80483784]) I haven't benchmarked, but I suspect this will be faster if you're using the same index multiple times.
![](https://secure.gravatar.com/avatar/18d7c4503b713c388142c34a10e26082.jpg?s=120&d=mm&r=g)
On 30/10/2019 19:10, Neal Becker wrote:
max(axis=1)?
Hi Neal, I should have been more precise in stating the problem. Getting the values in the array for which I'm looking at the maxima is only one step in a more complex piece of code for which I need the indexes along the second axis of the array. I would like to avoid to have to iterate the array more than once. Thank you! Cheers, Dan
![](https://secure.gravatar.com/avatar/d253a69e982f2c933f498541fd4748e0.jpg?s=120&d=mm&r=g)
I wouldn't be surprised at all if calling max in addition to argmax wasn't as fast or faster than indexing the array using argmax. Regardless, just use that then profile when you're done with the whole thing and see if there's any gains to be made. Very likely not here. -elliot On Wed, Oct 30, 2019, 10:32 PM Daniele Nicolodi <daniele@grinta.net> wrote:
![](https://secure.gravatar.com/avatar/18d7c4503b713c388142c34a10e26082.jpg?s=120&d=mm&r=g)
On 30/10/2019 22:42, Elliot Hallmark wrote:
Hi Elliot, how do you arrive at this conclusion? np.argmax() and np.max() are O(N) while indexing is O(1) thus I don't see how you can conclude that running both np.argmax() and np.max() on the input array is going to incur in a small penalty compared to running np.argmax() and then indexing. Cheers, Dan
![](https://secure.gravatar.com/avatar/d253a69e982f2c933f498541fd4748e0.jpg?s=120&d=mm&r=g)
Depends on how big your array is. Numpy C code is 150x+ faster than python overhead. Fancy indexing can be expensive in my experience. Without trying I'd guess arr[:, argmax(arr, axis=1)] does what you want, but even if it is, try profiling the two and see. I highly doubt such would be even 1% of your run time, but it depends on what your doing. Part of python with numpy is slightly not caring about big O because trying to be clever is rarely worth it in my experience. On Thu, Oct 31, 2019 at 12:35 AM Daniele Nicolodi <daniele@grinta.net> wrote:
![](https://secure.gravatar.com/avatar/18d7c4503b713c388142c34a10e26082.jpg?s=120&d=mm&r=g)
On 31-10-2019 01:44, Elliot Hallmark wrote:
It does not.
Why do you think I am asking for advice on how to do the complicated thing? If a 2x increase in the run time would have not mattered, I would not have bothered. Don't you think? I appreciate the effort spent guiding inexperienced users toward pragmatic solutions and not over complicate their code. However, it is disappointing to have very precise questions dismissed as "that is complicated, thus you don't really want to do it". Best, Dan
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
my thought was to try `take` or `take_along_axis`: ind = np.argmin(a, axis=1) np.take_along_axis(a, ind[:,None], axis=1) But those functions tend to simply fall back to fancy indexing, and are pretty slow. On my system plain fancy indexing is fastest:
Probably `take_along_axis` was designed with uses like yours in mind, but it is not very optimized. (I think numpy is lacking a category of efficient indexing/search/reduction functions, like 'findfirst', 'groupby', short-circuiting any_*/all_*/nonzero, the proposed oindex/vindex, better gufunc broadcasting. There is slow but gradual infrastructure work towards these, potentially). Cheers, Allan On 10/30/19 11:31 PM, Daniele Nicolodi wrote:
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
On my system plain fancy indexing is fastest
Hardly surprising, since take_along_axis is doing that under the hood, after constructing the index for you :) https://github.com/numpy/numpy/blob/v1.17.0/numpy/lib/shape_base.py#L58-L172 I deliberately didn't expose the internal function that constructs the slice, since leaving it private frees us to move those functions to c or in the distant future gufuncs. On Fri, Nov 1, 2019, 15:54 Allan Haldane <allanhaldane@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/18d7c4503b713c388142c34a10e26082.jpg?s=120&d=mm&r=g)
On 01-11-2019 09:51, Allan Haldane wrote:
Hi Allan, after scanning the documentation once more I found `take_along_axis` and was hoping that it implements some smart trick that does not involve generating and indexing array, but apparently that is what it does. Given the current numpy primitives, I don't see a way to optimize it further and keep it generic. I think the direct fancy indexing is faster in your case because of overhead in handling the generic case and not because of algorithmic inefficiency (from the run times you report it seems that your test array was fairly small). Thank you. Cheers, Dan
![](https://secure.gravatar.com/avatar/f9edf141570be78631565a8c8883bde5.jpg?s=120&d=mm&r=g)
You could move some of the cost to index-creation time by converting the per-row indices into flattened indices: In [1]: a = np.random.random((5, 6)) In [2]: i = a.argmax(axis=1) In [3]: a[np.arange(len(a)), i] Out[3]: array([0.95774465, 0.90940106, 0.98025448, 0.97836906, 0.80483784]) In [4]: f = np.ravel_multi_index((np.arange(len(a)), i), a.shape) In [5]: a.flat[f] Out[5]: array([0.95774465, 0.90940106, 0.98025448, 0.97836906, 0.80483784]) I haven't benchmarked, but I suspect this will be faster if you're using the same index multiple times.
participants (6)
-
Allan Haldane
-
CJ Carey
-
Daniele Nicolodi
-
Elliot Hallmark
-
Eric Wieser
-
Neal Becker