Feature query: fetch top/bottom k from array

Morning, My apologies if this deviates from the vision of numpy: I find myself often requiring the indices and/or values of the top (or bottom) k items in a numpy array. I am aware of solutions involving partition/argpartition but these are inelegant. I am thinking of 1-dimensional arrays, but this concept extends to an arbitrary number of dimensions. Is this a feature that would benefit the numpy package? I am happy to code it up. Thanks for your time! Best regards Joe

Joe, Could you show an example that you find inelegant and elaborate on how you intend to improve it? It's hard to discuss without more specific information. - Joe On Tue, Feb 22, 2022, 07:23 Joseph Bolton <joseph.jazz.bolton@gmail.com> wrote:
Morning,
My apologies if this deviates from the vision of numpy:
I find myself often requiring the indices and/or values of the top (or bottom) k items in a numpy array.
I am aware of solutions involving partition/argpartition but these are inelegant.
I am thinking of 1-dimensional arrays, but this concept extends to an arbitrary number of dimensions.
Is this a feature that would benefit the numpy package? I am happy to code it up.
Thanks for your time!
Best regards Joe
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: jfoxrabinovitz@gmail.com

Morning! I find myself often requiring the indices and/or values of the top (or bottom) k items in a numpy array. I am aware of solutions involving *partition*/*argpartition *but I find these inelegant (or using *sort *but these are inefficient). Is this a feature that would benefit the numpy package, or bloat it? I am happy to code it up. Here are some examples:
import numpy as np
a = np.array( [ [5,8,1,3,0], [5,6,2,1,3], [1,4,9,1,3], [8,0,4,7,0] ] )
# PROPOSED FEATURE: return (ordered) top 4 values in array:
a.top_k(k=4)
array([9, 8, 8, 7])
# CURRENT METHOD: return (ordered) top 4 values in array:
np.sort( np.partition(a.flatten(), -4)[-4:] )[::-1] # faster method
array([9, 8, 8, 7])
np.sort(a.flatten())[::-1][:4] # slower method
array([9, 8, 8, 7])
# PROPOSED FEATURE: return INDICES of (ordered) top 4 values in array:
a.top_k(k=4, return_indices=True)
array([12,1,15,18])
# CURRENT METHOD: return INDICES of (ordered) top 4 values in array:
(-a.flatten()).argsort()[:4]
array([12,1,15,18])
# PROPOSED FEATURE: multidimensional examples:
a.top_k(k=3, axis=0)
array( [8,5,1], [8,6,4], [9,4,2], [7,3,1], [3,3,0] )
a.top_k(k=3, axis=1)
array( [8,5,3], [6,5,2], [9,4,3], [8,7,4] ) I'd also consider including functionality for bottom k values, and methods for returning indices in the case of tied values (e.g. "first appearance", "random" etc.). Cheers Joe On Tue, 22 Feb 2022 at 15:30, Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:
Joe,
Could you show an example that you find inelegant and elaborate on how you intend to improve it? It's hard to discuss without more specific information.
- Joe
On Tue, Feb 22, 2022, 07:23 Joseph Bolton <joseph.jazz.bolton@gmail.com> wrote:
Morning,
My apologies if this deviates from the vision of numpy:
I find myself often requiring the indices and/or values of the top (or bottom) k items in a numpy array.
I am aware of solutions involving partition/argpartition but these are inelegant.
I am thinking of 1-dimensional arrays, but this concept extends to an arbitrary number of dimensions.
Is this a feature that would benefit the numpy package? I am happy to code it up.
Thanks for your time!
Best regards Joe
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: jfoxrabinovitz@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: joseph.jazz.bolton@gmail.com

Am Di., 22. Feb. 2022 um 14:25 Uhr schrieb Joseph Bolton <joseph.jazz.bolton@gmail.com>:
I find myself often requiring the indices and/or values of the top (or bottom) k items in a numpy array.
There has been discussion about this last year: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/F4P... Mentioned in that thread is the following pull request, which has some more discussion: https://github.com/numpy/numpy/pull/19117 Friedrich

pandas.Series has a nlargest/nsmallest method that might be upstream-able. On Wed, Feb 23, 2022 at 6:28 AM Friedrich Romstedt < friedrichromstedt@gmail.com> wrote:
Am Di., 22. Feb. 2022 um 14:25 Uhr schrieb Joseph Bolton <joseph.jazz.bolton@gmail.com>:
I find myself often requiring the indices and/or values of the top (or
bottom) k items in a numpy array.
There has been discussion about this last year:
https://mail.python.org/archives/list/numpy-discussion@python.org/thread/F4P...
Mentioned in that thread is the following pull request, which has some more discussion:
https://github.com/numpy/numpy/pull/19117
Friedrich _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: jbrockmendel@gmail.com
participants (4)
-
Brock Mendel
-
Friedrich Romstedt
-
Joseph Bolton
-
Joseph Fox-Rabinovitz