Feature request: Extension of the np.argsort Function - Returning Positional Information for Data
When dealing with lists that contain duplicate data, np.argsort fails to return index values that correspond to the actual sorting positions of the data, as it does when handling arrays without duplicates. Dear Author: When I use the np.argsort function on an array without duplicate data, the returned index values correspond to the sorting positions of the respective data.😀 x = [1, 2, 5, 4] rank = np.argsort(x) print(rank) # [0 1 3 2] However, when there are duplicate values, the results from np.argsort sometimes do not correspond to the sorting positions of the respective data . x = [1, 4, 1, 1, 2, 4, 5] rank = np.argsort(x) print(rank) # [0 2 3 4 1 5 6] Assuming a person frequently uses np.argsort to obtain positions by sorting data without duplicates, introducing duplicate values in the data may lead to inconspicuous errors in positions that are difficult to detect. Moreover, as the dataset grows, identifying such issues may become even more challenging. For users in this situation, the desired results might be achieved using the following function: import numpy as np def my_sort(x): arg_x = np.sort(x) rank = [np.where(arg_x == i)[0][0] for i in x] return np.array(rank) x = [1, 4, 1, 1, 2, 4, 5] rank_arg = np.argsort(x) rank_position = my_sort(x) print("rank_arg",rank_arg) print("rank_position",rank_position) # rank_arg [0 2 3 4 1 5 6] # rank_position [0 4 0 0 3 4 6] This method produces results consistent with np.argsort when applied to arrays without duplicate values. x = [1, 2, 5, 4] rank_arg = np.argsort(x) rank_position = my_sort(x) print("rank_arg",rank_arg) print("rank_position",rank_position) # rank_arg [0 1 3 2] # rank_position [0 1 3 2] Although there is no issue with the documentation of np.argsort itself, the need you've highlighted may be widespread. Therefore, it might be worth considering the addition of a function, for example, np.position(data), to enhance the functionality of numpy. My device: system: win10 64x python version: 3.9.11 numpy version: 1.21.2 Sincerely, Looking forward to your response! ❤❤❤❤
When dealing with lists that contain duplicate data, np.argsort fails to return index values that correspond to the actual sorting positions of the data, as it does when handling arrays without duplicates.
Dear Author:
When I use the np.argsort function on an array without duplicate data, the returned index values correspond to the sorting positions of the respective data.😀
x = [1, 2, 5, 4] rank = np.argsort(x) print(rank) # [0 1 3 2]
That is not what `argsort` is intended or documented to do. It returns an array of indices _into `x`_ such that if you took the values from `x` in
On Tue, Jan 16, 2024 at 11:05 PM hao chen <unbelieveble.chen@gmail.com> wrote: that order, you would get a sorted array. That is, if `x` were sorted into the array `sorted_x`, then `x[rank[i]] == sorted_x[i]` for all `i in range(len(x))`. The indices in `rank` are positions in `x`, not positions in `sorted_x`. They happen to correspond in this case, but that's a coincidence that's somewhat common in these small examples. But consider `[20, 30, 10, 40]`:
x = np.array([20, 30, 10, 40]) ix = np.argsort(x) def position(x): ... sorted_x = np.array(x) ... sorted_x.sort() ... return np.searchsorted(sorted_x, x) ... ip = position(x) ix array([2, 0, 1, 3]) ip array([1, 2, 0, 3])
But also notice:
np.argsort(np.argsort(x)) array([1, 2, 0, 3])
This double-argsort is what you seem to be looking for, though it depends on what you want from the handling of duplicates (do you return the first index into the sorted array with the same value as in my `position()` implementation, or do you return the index that particular item was actually sorted to). Either way, we probably aren't going to add this as its own function. Both options are straightforward combinations of existing primitives. -- Robert Kern
participants (2)
-
hao chen
-
Robert Kern