Dear list, This is forwarded from issue 6217 https://github.com/numpy/numpy/issues/6217 "What is the way to implement DESC ordering in the sorting routines of numpy?" (I am borrowing DESC/ASC from the SQL notation) For a stable DESC ordering sort, one can not revert the sorted array via argsort()[::-1] . I propose the following API change to argsorts/sort. (haven't thought about lexsort yet) I will use argsort as an example. Currently, argsort supports sorting by keys ('order') and by 'axis'. These two somewhat orthonal interfaces need to be treated differently. 1. by axis. Since there is just one sorting key, a single 'reversed' keyword argument is sufficient: a.argsort(axis=0, kind='merge', reversed=True) Jaime suggested this can be implemented efficiently as a post-processing step. (https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is there a reference to the algorithm? Because all of the sorting algorithms for 'atomic' dtypes are using _LT functions, a post processing step seems to be the only viable solution, if possible. 2. by fields, ('order' kwarg) A single 'reversed' keyword argument will not work, because some keys are ASC but others are DESC, for example, sorting my LastName ASC, then Salary DESC. a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'), ('Salary', 'desc'))]) The parsing rule of order is: - if an item is tuple, the first item is the fieldname, the second item is DESC/ASC - if an item is scalar, the fieldname is the item, the ordering is ASC. This part of the code already goes to VOID_compare, which walks a temporary copy of a.dtype to call the comparison functions. If I understood the purpose of c_metadata (numpy 1.7+) correctly, the ASC/DESC flags, offsets, and comparison functions can all be pre-compiled and passed into VOID_compare via c_metadata of the temporary type-descriptor. By just looking this will actually make VOID_compare faster by avoiding calling several Python C-API functions. negating the return value of cmp seems to be a negligable overhead in such a complex function. 3. If both reversed and order is given, the ASC/DESC fields in 'order' are effectively reversed. Any comments? Best, Yu
On Wed, Aug 19, 2015 at 1:10 PM, Feng Yu <rainwoodman@gmail.com> wrote:
Dear list,
This is forwarded from issue 6217 https://github.com/numpy/numpy/issues/6217
"What is the way to implement DESC ordering in the sorting routines of numpy?"
(I am borrowing DESC/ASC from the SQL notation)
For a stable DESC ordering sort, one can not revert the sorted array via argsort()[::-1] .
I propose the following API change to argsorts/sort. (haven't thought about lexsort yet) I will use argsort as an example.
Currently, argsort supports sorting by keys ('order') and by 'axis'. These two somewhat orthonal interfaces need to be treated differently.
1. by axis.
Since there is just one sorting key, a single 'reversed' keyword argument is sufficient:
a.argsort(axis=0, kind='merge', reversed=True)
Jaime suggested this can be implemented efficiently as a post-processing step. (https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is there a reference to the algorithm?
My thinking was that, for native types, you can stably reverse a sorted permutation in-place by first reversing item-by-item, then reversing every chunk of repeated entries. Sort of the way you would reverse the words in a sentence in-place: first reverse every letter, then reverse everything bounded by spaces: TURN ME AROUND -> DNUORA EM NRUT -> AROUND EM NRUT -> AROUND ME NRUT -> AROUND ME TURN We could even add a type-specific function to do this for each of the native types in the npy_sort library. As I mentioned in Yu's very nice PR <https://github.com/numpy/numpy/pull/6222>, probably it is best to leave the signature of the function alone, and have something like order='desc' be the trigger for the proposed reversed=True. Jaime
Because all of the sorting algorithms for 'atomic' dtypes are using _LT functions, a post processing step seems to be the only viable solution, if possible.
2. by fields, ('order' kwarg)
A single 'reversed' keyword argument will not work, because some keys are ASC but others are DESC, for example, sorting my LastName ASC, then Salary DESC.
a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'), ('Salary', 'desc'))])
The parsing rule of order is:
- if an item is tuple, the first item is the fieldname, the second item is DESC/ASC - if an item is scalar, the fieldname is the item, the ordering is ASC.
This part of the code already goes to VOID_compare, which walks a temporary copy of a.dtype to call the comparison functions.
If I understood the purpose of c_metadata (numpy 1.7+) correctly, the ASC/DESC flags, offsets, and comparison functions can all be pre-compiled and passed into VOID_compare via c_metadata of the temporary type-descriptor.
By just looking this will actually make VOID_compare faster by avoiding calling several Python C-API functions. negating the return value of cmp seems to be a negligable overhead in such a complex function.
3. If both reversed and order is given, the ASC/DESC fields in 'order' are effectively reversed.
Any comments?
Best,
Yu _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
participants (2)
-
Feng Yu
-
Jaime Fernández del Río