Morning! 

I find myself often requiring the indices and/or values of the top (or bottom) k items in a numpy array. I am aware of solutions involving partition/argpartition but I find these inelegant (or using sort but these are inefficient). 

Is this a feature that would benefit the numpy package, or bloat it? I am happy to code it up.

Here are some examples:

>> import numpy as np

>> a = np.array( [ [5,8,1,3,0], [5,6,2,1,3], [1,4,9,1,3], [8,0,4,7,0] ] )

 

>> # PROPOSED FEATURE: return (ordered) top 4 values in array:

>> a.top_k(k=4)

array([9, 8, 8, 7])

 

>> # CURRENT METHOD: return (ordered) top 4 values in array:

>> np.sortnp.partition(a.flatten(), -4)[-4:] )[::-1]    # faster method    

array([9, 8, 8, 7])

>> np.sort(a.flatten())[::-1][:4]                         # slower method

array([9, 8, 8, 7])

 

>> # PROPOSED FEATURE: return INDICES of (ordered) top 4 values in array:

>> a.top_k(k=4return_indices=True)

array([12,1,15,18])

 

>> # CURRENT METHOD: return  INDICES   of (ordered) top 4 values in array:

>> (-a.flatten()).argsort()[:4]

array([12,1,15,18])

 

>> # PROPOSED FEATURE: multidimensional examples:

>> a.top_k(k=3axis=0)

array( [8,5,1], [8,6,4], [9,4,2], [7,3,1], [3,3,0] )

>> a.top_k(k=3axis=1)

array( [8,5,3], [6,5,2], [9,4,3], [8,7,4] )




I'd also consider including functionality for bottom k values, and methods for returning indices in the case of tied values (e.g. "first appearance", "random" etc.).

Cheers
Joe 


On Tue, 22 Feb 2022 at 15:30, Joseph Fox-Rabinovitz <jfoxrabinovitz@gmail.com> wrote:
Joe,

Could you show an example that you find inelegant and elaborate on how you intend to improve it? It's hard to discuss without more specific information.

- Joe

On Tue, Feb 22, 2022, 07:23 Joseph Bolton <joseph.jazz.bolton@gmail.com> wrote:
Morning,

My apologies if this deviates from the vision of numpy:

I find myself often requiring the indices and/or values of the top (or bottom) k items in a numpy array. 

I am aware of solutions involving partition/argpartition but these are inelegant.

I am thinking of 1-dimensional arrays, but this concept extends to an arbitrary number of dimensions. 

Is this a feature that would benefit the numpy package? I am happy to code it up.

Thanks for your time! 

Best regards 
Joe 




_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-leave@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: jfoxrabinovitz@gmail.com
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-leave@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: joseph.jazz.bolton@gmail.com