[Numpy-discussion] Fwd: Reverse(DESC)-ordered sorting

Jaime Fernández del Río jaime.frio at gmail.com
Thu Aug 20 00:43:04 EDT 2015


On Wed, Aug 19, 2015 at 1:10 PM, Feng Yu <rainwoodman at gmail.com> wrote:

> Dear list,
>
> This is forwarded from issue 6217
> https://github.com/numpy/numpy/issues/6217
>
> "What is the way to implement DESC ordering in the sorting routines of
> numpy?"
>
> (I am borrowing DESC/ASC from the SQL notation)
>
> For a stable DESC ordering sort, one can not  revert the sorted array via
> argsort()[::-1] .
>
> I propose the following API change to argsorts/sort. (haven't thought
> about lexsort yet) I will use argsort as an example.
>
> Currently, argsort supports sorting by keys ('order') and by 'axis'.
> These two somewhat orthonal interfaces need to be treated differently.
>
> 1. by axis.
>
> Since there is just one sorting key, a single 'reversed' keyword
> argument is sufficient:
>
> a.argsort(axis=0, kind='merge', reversed=True)
>
> Jaime suggested this can be implemented efficiently as a
> post-processing step.
> (https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is
> there a reference to the algorithm?
>

My thinking was that, for native types, you can stably reverse a sorted
permutation in-place by first reversing item-by-item, then reversing every
chunk of repeated entries. Sort of the way you would reverse the words in a
sentence in-place: first reverse every letter, then reverse everything
bounded by spaces:

TURN ME AROUND -> DNUORA EM NRUT -> AROUND EM NRUT -> AROUND ME NRUT ->
AROUND ME TURN

We could even add a type-specific function to do this for each of the
native types in the npy_sort library.

As I mentioned in Yu's very nice PR
<https://github.com/numpy/numpy/pull/6222>, probably it is best to leave
the signature of the function alone, and have something like order='desc'
be the trigger for the proposed reversed=True.

Jaime


>
> Because all of the sorting algorithms for 'atomic' dtypes are using
> _LT functions, a post processing step seems to be the only viable
> solution, if possible.


>
> 2. by fields, ('order' kwarg)
>
> A single 'reversed' keyword argument will not work, because some keys
> are ASC but others are DESC, for example, sorting my LastName ASC,
> then Salary DESC.
>
> a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'),
> ('Salary', 'desc'))])
>
> The parsing rule of order is:
>
> - if an item is tuple, the first item is the fieldname, the second
> item is DESC/ASC
> - if an item is scalar, the fieldname is the item, the ordering is ASC.
>
> This part of the code already goes to VOID_compare, which walks a
> temporary copy of a.dtype to call the comparison functions.
>
> If I understood the purpose of c_metadata (numpy 1.7+) correctly, the
> ASC/DESC flags, offsets, and comparison functions can all be
> pre-compiled and passed into VOID_compare via c_metadata of the
> temporary type-descriptor.
>
> By just looking this will actually make VOID_compare faster by
> avoiding calling several Python C-API functions. negating the return
> value of cmp seems to be a negligable overhead in such a complex
> function.


> 3. If both reversed and order is given, the ASC/DESC fields in 'order'
> are effectively reversed.
>
> Any comments?
>
> Best,
>
> Yu
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150819/6a88ad0b/attachment.html>


More information about the NumPy-Discussion mailing list