Multiple level sorting in python where the order of some levels may or may not be reversed

Hi all, I have a list called count_list which contains tuples like below: [('bridge', 2), ('fair', 1), ('lady', 1), ('is', 2), ('down', 4),
('london', 2), ('falling', 4), ('my', 1)]
I want to sort it based on the second parameter in descending order and the tuples with the same second parameter must be sorted based on their first parameter, in alphabetically ascending order. So the ideal output is: [('down', 4), ('falling', 4), ('bridge', 2), ('is', 2), ('london', 2),
('fair', 1), ('lady', 1), ('my', 1)]
What I ended up doing is: count_list = sorted(count_list,
which works. Now my solution is very specific to structures like [(str, int)] where all strs are lower case and besides ord makes it to be both limited in use and also more difficult to add extra levels of sorting. The main problem is that reverse argument takes only a boolean and applies to the whole list after sorting in finished. I couldn't think of any other way (besides mapping negative to ord values of x[0]) to say reverse on the first level but not reverse on the second level. Something like below would be ideal: count_list = sorted(count_list,
key=lambda x: (x[1], x[0]), reverse=(True, False))
Does such a way similar to above exist? If not, how useful would it be to implement it? *P.S.* It's my first time on a mailing list. I apologize before hand if such a thing has already been discussed or even there exist a way which already achieves that.

On Sun, Oct 16, 2016 at 3:29 PM, Alireza Rafiei <alireza.rafiei94@gmail.com> wrote:
Interesting. Personally, I would invert this; if you're sorting by an integer and a string, negate the integer, and keep the string as-is. If that doesn't work, a custom class might help. # untested class Record: reverse = False, True, True, False, True def __init__(data): self.data = data def __lt__(self, other): for v1, v2, rev in zip(self.data, other.data, self.reverse): if v1 < v2: return rev if v2 > v1: return not rev return False This is broadly similar to how tuple.__lt__ works, allowing you to flip the logic of whichever ones you like. ChrisA

[Alireza Rafiei <alireza.rafiei94@gmail.com>]
I'd like to suggest doing something simple instead, such that data = [('bridge', 2), ('fair', 1), ('lady', 1), ('is', 2), ('down', 4), ('london', 2), ('falling', 4), ('my', 1)] from operator import itemgetter multisort(data, [# primary key is 2nd element, reversed (itemgetter(1), True), # secondary key is 1st element, natural (itemgetter(0), False)]) import pprint pprint.pprint(data) prints the output you want. It's actually very easy to do this, but the cost is that it requires doing a full sort for _each_ field you want to sort on. Because Python's sort is stable, it's sufficient to merely sort on the least-significant key first, and then sort again on each key in turn through the most-significant. There's another subtlety that makes this work:
Luckily, that's not how `reverse` actually works. It _first_reverses the list, then sorts, and then reverses the list again. The result is that items that compare equal _retain_ their original order, where just reversing the sorted list would invert their order. That's why, in your example above, after first sorting on the string component in natural order we get (in part) [[('down', 4), ('falling', 4), ...] and then reverse-sorting on the integer portion _leaves_ those tuples in the same order. That's essential for this decomposition of the problem to work. With that background, here's the implementation: def multisort(xs, specs): for key, reverse in reversed(specs): xs.sort(key=key, reverse=reverse) That's all it takes! And it accepts any number of items in `specs`. Before you worry that it's "too slow", time it on real test data. `.sort()` is pretty zippy, and this simple approach allows using simple key functions. More importantly, it's much easier on your brain ;-)

On 16 October 2016 at 08:35, Alireza Rafiei <alireza.rafiei94@gmail.com> wrote:
Awesome! Thanks for the thorough explanation.
Thank you for the interesting suggestion that prompted the explanation. I don't know about others, but I know that I often forget ways to use the tools already at our disposal, so threads like this are a useful reminder. (And welcome to the mailing list - hopefully your stay will be pleasant :-)) Paul

On 16.10.2016 09:35, Alireza Rafiei wrote:
Awesome! Thanks for the thorough explanation.
Indeed. I also didn't know about that detail of reversing. :) Amazing. (Also welcome to the list, Alireza.)
@Tim Do you think that simple solution could have a chance to be added to stdlib somehow (with the possibility of speeding it up in the future)? Regards, Sven

On 17 October 2016 at 21:06, Sven R. Kunze <srkunze@mail.de> wrote:
Do you think that simple solution could have a chance to be added to stdlib somehow (with the possibility of speeding it up in the future)?
You could submit a doc patch to add an explanation of this technique to the list.sort function. I doubt it's worth a builtin for a 2-line function. Paul

On 17.10.2016 22:31, Paul Moore wrote:
Is the github repo ready? If so, I will do.
I doubt it's worth a builtin for a 2-line function.
Not this 2-line function alone indeed. As my note about speeding it up in the future goes, I thought about an interface which allows people to do easy multisort BUT with the possibility of further optimization by the CPython or other Python implementations. Cheers, Sven

On 17/10/2016 21:31, Paul Moore wrote:
How about changing https://wiki.python.org/moin/HowTo/Sorting ? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

On 17 October 2016 at 22:28, Mark Lawrence via Python-ideas <python-ideas@python.org> wrote:
How about changing https://wiki.python.org/moin/HowTo/Sorting ?
Good point. Better still, https://docs.python.org/3.6/howto/sorting.html Paul

[Sven R. Kunze <srkunze@mail.de>]
Indeed. I also didn't know about that detail of reversing. :) Amazing. (Also welcome to the list, Alireza.)
It follows from what the docs say, although I'd agree it may be helpful if the docs explicitly spelled out this consequence (that reverse=True also preserves the original order of equal elements - as the docs say, it's not that the _list_ "is reversed", is that "list elements are sorted as if each comparison were reversed").
Do you think that simple solution could have a chance to be added to stdlib somehow (with the possibility of speeding it up in the future)?
Well, the sorting "how to" already explains the basic idea. The `.sort()` docs also explain that stability "is helpful for sorting in multiple passes (for example, sort by department, then by salary grade)". I suspect I know too much about this to be of any use in judging what's missing ;-) Speeding it wouldn't be easy - or usually necessary. The obvious "improvement" would do it all in a single `.sort()` invocation. But calling back into Python code to do fancy, multi-step comparisons is so expensive that I expect it would take a large N for saving some additional worst-case O(N*log(N)) sorting steps to repay the cost. If you'd like to play with that, here's a different `multisort()` implementation. Again `specs` is a list of (key_function, True-for-reverse) tuples, most-significant key first. And, again, no assumptions are made about what key functions return, and the sort continues to guarantee that only "<" comparisons are made: def _sorter(specs): keyfuncs, reversers = zip(*specs) class Wrapper(object): def __init__(self, obj): self.keys = tuple(f(obj) for f in keyfuncs) def __lt__(x, y): for a, b, r in zip(x.keys, y.keys, reversers): if a < b: return not r if b < a: return r return False # all the keys are equal return Wrapper def multisort(xs, specs): xs.sort(key=_sorter(specs))

On Sun, Oct 16, 2016 at 3:29 PM, Alireza Rafiei <alireza.rafiei94@gmail.com> wrote:
Interesting. Personally, I would invert this; if you're sorting by an integer and a string, negate the integer, and keep the string as-is. If that doesn't work, a custom class might help. # untested class Record: reverse = False, True, True, False, True def __init__(data): self.data = data def __lt__(self, other): for v1, v2, rev in zip(self.data, other.data, self.reverse): if v1 < v2: return rev if v2 > v1: return not rev return False This is broadly similar to how tuple.__lt__ works, allowing you to flip the logic of whichever ones you like. ChrisA

[Alireza Rafiei <alireza.rafiei94@gmail.com>]
I'd like to suggest doing something simple instead, such that data = [('bridge', 2), ('fair', 1), ('lady', 1), ('is', 2), ('down', 4), ('london', 2), ('falling', 4), ('my', 1)] from operator import itemgetter multisort(data, [# primary key is 2nd element, reversed (itemgetter(1), True), # secondary key is 1st element, natural (itemgetter(0), False)]) import pprint pprint.pprint(data) prints the output you want. It's actually very easy to do this, but the cost is that it requires doing a full sort for _each_ field you want to sort on. Because Python's sort is stable, it's sufficient to merely sort on the least-significant key first, and then sort again on each key in turn through the most-significant. There's another subtlety that makes this work:
Luckily, that's not how `reverse` actually works. It _first_reverses the list, then sorts, and then reverses the list again. The result is that items that compare equal _retain_ their original order, where just reversing the sorted list would invert their order. That's why, in your example above, after first sorting on the string component in natural order we get (in part) [[('down', 4), ('falling', 4), ...] and then reverse-sorting on the integer portion _leaves_ those tuples in the same order. That's essential for this decomposition of the problem to work. With that background, here's the implementation: def multisort(xs, specs): for key, reverse in reversed(specs): xs.sort(key=key, reverse=reverse) That's all it takes! And it accepts any number of items in `specs`. Before you worry that it's "too slow", time it on real test data. `.sort()` is pretty zippy, and this simple approach allows using simple key functions. More importantly, it's much easier on your brain ;-)

On 16 October 2016 at 08:35, Alireza Rafiei <alireza.rafiei94@gmail.com> wrote:
Awesome! Thanks for the thorough explanation.
Thank you for the interesting suggestion that prompted the explanation. I don't know about others, but I know that I often forget ways to use the tools already at our disposal, so threads like this are a useful reminder. (And welcome to the mailing list - hopefully your stay will be pleasant :-)) Paul

On 16.10.2016 09:35, Alireza Rafiei wrote:
Awesome! Thanks for the thorough explanation.
Indeed. I also didn't know about that detail of reversing. :) Amazing. (Also welcome to the list, Alireza.)
@Tim Do you think that simple solution could have a chance to be added to stdlib somehow (with the possibility of speeding it up in the future)? Regards, Sven

On 17 October 2016 at 21:06, Sven R. Kunze <srkunze@mail.de> wrote:
Do you think that simple solution could have a chance to be added to stdlib somehow (with the possibility of speeding it up in the future)?
You could submit a doc patch to add an explanation of this technique to the list.sort function. I doubt it's worth a builtin for a 2-line function. Paul

On 17.10.2016 22:31, Paul Moore wrote:
Is the github repo ready? If so, I will do.
I doubt it's worth a builtin for a 2-line function.
Not this 2-line function alone indeed. As my note about speeding it up in the future goes, I thought about an interface which allows people to do easy multisort BUT with the possibility of further optimization by the CPython or other Python implementations. Cheers, Sven

On 17/10/2016 21:31, Paul Moore wrote:
How about changing https://wiki.python.org/moin/HowTo/Sorting ? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

On 17 October 2016 at 22:28, Mark Lawrence via Python-ideas <python-ideas@python.org> wrote:
How about changing https://wiki.python.org/moin/HowTo/Sorting ?
Good point. Better still, https://docs.python.org/3.6/howto/sorting.html Paul

[Sven R. Kunze <srkunze@mail.de>]
Indeed. I also didn't know about that detail of reversing. :) Amazing. (Also welcome to the list, Alireza.)
It follows from what the docs say, although I'd agree it may be helpful if the docs explicitly spelled out this consequence (that reverse=True also preserves the original order of equal elements - as the docs say, it's not that the _list_ "is reversed", is that "list elements are sorted as if each comparison were reversed").
Do you think that simple solution could have a chance to be added to stdlib somehow (with the possibility of speeding it up in the future)?
Well, the sorting "how to" already explains the basic idea. The `.sort()` docs also explain that stability "is helpful for sorting in multiple passes (for example, sort by department, then by salary grade)". I suspect I know too much about this to be of any use in judging what's missing ;-) Speeding it wouldn't be easy - or usually necessary. The obvious "improvement" would do it all in a single `.sort()` invocation. But calling back into Python code to do fancy, multi-step comparisons is so expensive that I expect it would take a large N for saving some additional worst-case O(N*log(N)) sorting steps to repay the cost. If you'd like to play with that, here's a different `multisort()` implementation. Again `specs` is a list of (key_function, True-for-reverse) tuples, most-significant key first. And, again, no assumptions are made about what key functions return, and the sort continues to guarantee that only "<" comparisons are made: def _sorter(specs): keyfuncs, reversers = zip(*specs) class Wrapper(object): def __init__(self, obj): self.keys = tuple(f(obj) for f in keyfuncs) def __lt__(x, y): for a, b, r in zip(x.keys, y.keys, reversers): if a < b: return not r if b < a: return r return False # all the keys are equal return Wrapper def multisort(xs, specs): xs.sort(key=_sorter(specs))
participants (6)
-
Alireza Rafiei
-
Chris Angelico
-
Mark Lawrence
-
Paul Moore
-
Sven R. Kunze
-
Tim Peters