sort descending on a[1], ascending on a[0]

Alex Martelli aleax at aleax.it
Fri Oct 11 09:55:05 CEST 2002


Manuel M. Garcia wrote:

> t = """
> 
> jsaul asked a question about a custom sort, which reminded me of a
> problem I often have with sorting.
> 
> What is everyones favorite way of sorting descending on a[1], and
> ascending on a[0]? (or however you wish, even changing this on
> the fly)
> 
> I am not a big fan of passing sort() a comparison function, but I
> don't see how I could use the Decorate - Sort - Undecorate (DSU)
> technique.

You could, but you wouldn't necessarily want to, for such specific
needs of mixing ascending and descending sorts.  For this task,
there are various approaches with different complication and
performance characteristics -- and even if you do desperately
care about performance, the choice depends on what Python version
you want to run your program with.



N = 100 * 1000
biggie = [ (i//4, str(i)) for i in xrange(N) ]

def asc0desc1(a, b):
    return cmp(a[0],b[0]) or cmp(b[1],a[1])

def sortAdByMethod():
    result = biggie[:]
    result.sort(asc0desc1)
    return result

def sortAdByDSUPasses():
    temp = [ (it[1],it[0]) for it in biggie ]
    temp.sort()
    temp.reverse()
    temp = [ (temp[i][1], i, temp[i][0]) for i in range(len(temp)) ]
    temp.sort()
    return [ (a, c) for a, b, c in temp ]

reverser = ''.join([ chr(255-x) for x in range(256) ])
def sortAdByRichDSU():
    temp = [ (it[0], it[1].translate(reverser)) for it in biggie ]
    temp.sort()
    return [ (it[0], it[1].translate(reverser)) for it in temp ]

byMet = sortAdByMethod()
byPas = sortAdByDSUPasses()
byRic = sortAdByRichDSU()

print byMet==byPas, byMet==byRic, byPas==byRic

import time
def timeit(func):
    start = time.clock()
    func()
    stend = time.clock()
    return '%.2f %s' % (stend-start, func.__name__)

for func in sortAdByMethod, sortAdByDSUPasses, sortAdByRichDSU:
    print timeit(func)


[alex at lancelot alex]$ python2.2 -O twoso.py
1 1 1
3.74 sortAdByMethod
4.17 sortAdByDSUPasses
2.74 sortAdByRichDSU
[alex at lancelot alex]$ python2.3 -O twoso.py
True True True
0.97 sortAdByMethod
2.75 sortAdByDSUPasses
1.88 sortAdByRichDSU
[alex at lancelot alex]$


As you see, pure DSU (with the needed insertion of index to make
the second pass stable -- 2.3's sort IS stable, but you still
need the index to stop items being compared again by the field
already used in the first pass!) is slowest here -- by just a
little in 2.2, by quite a bit in 2.3.  The complicated approach
of synthesizing 'decorated strings' whose comparison is the
reverse than the original's, besides lack of generality (quite
serious, btw -- it works here, but it will NOT work when two
strings being compared can have the same prefix but with one 
longer than the other!), is only a little faster than the simple
minded "pass a comparison function" in 2.2, and SLOWER than it
in 2.3, by a substantial margin too.

Thus, in practice, I would not suggest DSU for such a mixed
sort unless you *know* that the fields on which you need to sort
in reverse are numbers -- in THAT case, changing sign is easy,
and the DSU approach is simple and fast once again, e.g.:


N = 100 * 1000
biggie = [ (i%4, i//4) for i in xrange(N) ]

def asc0desc1(a, b):
    return cmp(a[0],b[0]) or cmp(b[1],a[1])

def sortAdByMethod():
    result = biggie[:]
    result.sort(asc0desc1)
    return result

def sortAdDSU():
    temp = [ (it[0],-it[1]) for it in biggie ]
    temp.sort()
    return [ (it[0],-it[1]) for it in temp ]

byMet = sortAdByMethod()
byDsu = sortAdDSU()

print byMet==byDsu

import time
def timeit(func):
    start = time.clock()
    func()
    stend = time.clock()
    return '%.2f %s' % (stend-start, func.__name__)

for func in sortAdByMethod, sortAdDSU:
    print timeit(func)

[alex at lancelot alex]$ python2.2 -O twoso.py
1
5.07 sortAdByMethod
2.90 sortAdDSU
[alex at lancelot alex]$ python2.3 -O twoso.py
True
1.39 sortAdByMethod
1.31 sortAdDSU
[alex at lancelot alex]$


Again 2.3's new sort changes things, making the two
approaches just about equivalent even here.  But at least
in this case you can adopt DSU with a clear conscience
(AFTER some benchmarking on YOUR specific machine and
problem, of course:-).


BTW, I keep being amazed at how much faster 2.3 is than
2.2 -- sort may be an extreme case, but the timing
ratios seem to be always in 2.3's favor in every case
I've measured, and sometimes substantially so.  Let's
hope the release of this speed jewel does not get
delayed by the wish to ALSO tweak a zillion other
things -- the Python community deserves to bask in its
beauteous new speed, and such speed-ups would no doubt
help Python win more converts, too!-).


Alex




More information about the Python-list mailing list