On Sun, May 30, 2021 at 10:41 AM kangkai@mail.ustc.edu.cn wrote:

On Fri, May 28, 2021 at 4:58 PM <kangkai at mail.ustc.edu.cn <mailto:kangkai at mail.ustc.edu.cn>> wrote:

`Hi all, Finding topk elements is widely used in several fields, but missed in NumPy. I implement this functionality named as numpy.topk using core numpy functions and open a PR: https://github.com/numpy/numpy/pull/19117 <https://github.com/numpy/numpy/pull/19117> Any discussion are welcome.`

Thanks for the proposal Kang. I think this functionality is indeed a fairly obvious gap in what Numpy offers, and would make sense to add. A detailed comparison with other libraries would be very helpful here. TensorFlow and JAX call this function `top_k`, while PyTorch, Dask and MXNet call it `topk`.

Two things to look at in more detail here are:

- complete signatures of the function in each of those libraries, and
what the commonality is there. 2. the argument Eric made on your PR about consistency with sort/argsort, and if we want topk/argtopk? Also, do other libraries have `argtopk`?

Cheers, Ralf

`Best wishes, Kang Kai`

Hi, Thanks for reply, I present some details below:

Thanks for the detailed investigation Kang!

## 1. complete signatures of the function in each of those libraries, and what the commonality is there.

| Library | Name | arg1 | arg2 | arg3 | arg4 | arg5 |

|-------------|--------------------|-------|------|------|-----------|--------| | NumPy [1 ] | numpy.topk | a | k | axis | largest | sorted | | PyTorch [2 ] | torch.topk | input | k | dim | largest | sorted | | R [3 ] | topK | x | K | / | / | / | | MXNet [4 ] | mxnet.npx.topk | data | k | axis | is_ascend | / | | CNTK [5 ] | cntk.ops.top_k | x | k | axis | / | / | | TF [6 ] | tf.math.top_k | input | k | / | / | sorted | | Dask [7 ] | dask.array.topk | a | k | axis | -k | / | | Dask [8 ] | dask.array.argtopk | a | k | axis | -k | / | | MATLAB [9 ] | mink | A | k | dim | / | / | | MATLAB [10 ] | maxk | A | k | dim | / | / |

| Library | Name | Returns | |-------------|--------------------|---------------------| | NumPy [1] | numpy.topk | values, indices | | PyTorch [2] | torch.topk | values, indices | | R [3] | topK | indices | | MXNet [4] | mxnet.npx.topk | controls by ret_typ | | CNTK [5] | cntk.ops.top_k | values, indices | | TF [6] | tf.math.top_k | values, indices | | Dask [7] | dask.array.topk | values | | Dask [8] | dask.array.argtopk | indices | | MATLAB [9] | mink | values, indices | | MATLAB [10] | maxk | values, indices |

- arg1: Input array.
- arg2: Number of top elements to look for along the given axis.
- arg3: Axis along which to find topk.

- R only supports vector, TensorFlow only supports axis=-1.
- arg4: Controls whether to return k largest or smallest elements.

- R, CNTK and TensorFlow only return k largest elements.
In Dask, k can be negative, which means to return k smallest elements. - In MATLAB, use two distinct functions.

- arg5: If true the resulting k elements will be sorted by the values.

- R, MXNet, CNTK, Dask and MATLAB only return sorted elements.
**Summary**:

- Function Name: could be `topk`, `top_k`, `mink`/`maxk`.
- arg1 (a), arg2 (k), arg3 (axis): should be required.
- arg4 (largest), arg4 (sorted): might be discussed.
- Returns: discussed below.
## 2. the argument Eric made on your PR about consistency with sort/argsort, if we want topk/argtopk? Also, do other libraries have `argtopk`

In most libraries, `topk` or `top_k` returns both values and indices, and `argtopk` is not included except for Dask. In addition, there is another inconsistency: `sort` returns ascending values, but `topk` returns descending values.

## Suggestions Finally, IMHO, new function signature might be designed as one of: I) use `topk` / `argtopk` or `top_k` / `argtop_k`

`def topk(a, k, axis=-1, sorted=True) -> topk_values def argtopk(a, k, axis=-1, sorted=True) -> topk_indices`

or

`def top_k(a, k, axis=-1, sorted=True) -> topk_values def argtop_k(a, k, axis=-1, sorted=True) -> topk_indices`

where `k` can be negative which means to return k smallest elements.

I don't think I'm a fan of the `-k` cleverness. Saying you want `-5` values as a stand-in for wanting the 5 smallest values is worse than a keyword imho.

It seems like commenters so far have a preference for `top_k` over `topk`, because of readability. Either way it's going to impact Dask, JAX, etc. - so it would be nice to get some input from maintainers of those libraries.

The two functions vs. returning `(values, indices)` is also a tricky choice - it may depend on usage patterns. If one needs indices a lot, then there's something to say for the tuple return. Otherwise the code is going to look like:

indices = argtop_k(x, ....) values = x[indices]

which is significantly worse than:

values, indices = top_k(x, ...)

II) use `maxk` / `argmaxk` or `max_k` / `argmax_k` (`mink` / `argmink` or `min_k` / `argmin_k`)

I suggest to forget about maxk/max_k. All Python libraries call it topk/top_k. And Matlab choosing something is usually a good reason to run in the other direction.

Cheers, Ralf

```python

def maxk(a, k, axis=-1, sorted=True) -> values def argmaxk(a, k, axis=-1, sorted=True) -> indices

def mink(a, k, axis=-1, sorted=True) -> values def argmink(a, k, axis=-1, sorted=True) -> indices

`or ```python def max_k(a, k, axis=-1, sorted=True) -> values def argmax_k(a, k, axis=-1, sorted=True) -> indices def min_k(a, k, axis=-1, sorted=True) -> values def argmin_k(a, k, axis=-1, sorted=True) -> indices`

where `k` must be positive.

**References**:

- [1] https://github.com/numpy/numpy/pull/19117
- [2] https://pytorch.org/docs/stable/generated/torch.topk.html
- [3]
https://www.rdocumentation.org/packages/tensr/versions/1.0.1/topics/topK

- [4]
https://mxnet.apache.org/versions/master/api/python/docs/api/npx/generated/m...

- [5]
https://docs.microsoft.com/en-us/python/api/cntk/cntk.ops?view=cntk-py-2.7#t...

https://docs.dask.org/en/latest/array-api.html?highlight=topk#dask.array.top...

- [8]
https://docs.dask.org/en/latest/array-api.html?highlight=topk#dask.array.arg...

- [9] https://nl.mathworks.com/help/matlab/ref/maxk.html
- [10] https://nl.mathworks.com/help/matlab/ref/mink.html

NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion