[Numpy-discussion] Fwd: Reverse(DESC)-ordered sorting

Feng Yu rainwoodman at gmail.com
Wed Aug 19 16:10:51 EDT 2015


Dear list,

This is forwarded from issue 6217 https://github.com/numpy/numpy/issues/6217

"What is the way to implement DESC ordering in the sorting routines of numpy?"

(I am borrowing DESC/ASC from the SQL notation)

For a stable DESC ordering sort, one can not  revert the sorted array via
argsort()[::-1] .

I propose the following API change to argsorts/sort. (haven't thought
about lexsort yet) I will use argsort as an example.

Currently, argsort supports sorting by keys ('order') and by 'axis'.
These two somewhat orthonal interfaces need to be treated differently.

1. by axis.

Since there is just one sorting key, a single 'reversed' keyword
argument is sufficient:

a.argsort(axis=0, kind='merge', reversed=True)

Jaime suggested this can be implemented efficiently as a
post-processing step.
(https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is
there a reference to the algorithm?

Because all of the sorting algorithms for 'atomic' dtypes are using
_LT functions, a post processing step seems to be the only viable
solution, if possible.


2. by fields, ('order' kwarg)

A single 'reversed' keyword argument will not work, because some keys
are ASC but others are DESC, for example, sorting my LastName ASC,
then Salary DESC.

a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'),
('Salary', 'desc'))])

The parsing rule of order is:

- if an item is tuple, the first item is the fieldname, the second
item is DESC/ASC
- if an item is scalar, the fieldname is the item, the ordering is ASC.

This part of the code already goes to VOID_compare, which walks a
temporary copy of a.dtype to call the comparison functions.

If I understood the purpose of c_metadata (numpy 1.7+) correctly, the
ASC/DESC flags, offsets, and comparison functions can all be
pre-compiled and passed into VOID_compare via c_metadata of the
temporary type-descriptor.

By just looking this will actually make VOID_compare faster by
avoiding calling several Python C-API functions. negating the return
value of cmp seems to be a negligable overhead in such a complex
function.

3. If both reversed and order is given, the ASC/DESC fields in 'order'
are effectively reversed.

Any comments?

Best,

Yu



More information about the NumPy-Discussion mailing list