Guido rethinking removal of cmp from sort method

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Mar 28 11:43:03 EDT 2011


On Mon, 28 Mar 2011 14:39:04 +0200, Antoon Pardon wrote:

> I tried to sort lists of 10000 elemens. Each element a tuple two items
> that was to be sorted first according to the first item in ascending
> order, then according to the second item in descending order. This was
> done on python 2.6 so I had to write my own cmp_to_key function.
> 
> I then sorted these lists of 10000 elements a 1000 times with the
> following methods.

Thank you for spending the time to get some hard data, but I can't 
replicate your results since you haven't shown your code. Rather than 
attempt to guess what you did and duplicate it, I instead came up with my 
own timing measurements. Results are shown here, my code follows below:


[steve at sylar ~]$ python2.7 sort_test.py
Comparison function: 12.1256039143
Key function: 3.51603388786
Double sort: 2.33165812492
cmp_to_key: 28.1129128933


By far the best result comes from two calls to sort. Not far off is the 
key function solution. (Admittedly, coming up with a key function for non-
numeric data would be challenging.) The comparison function, and the 
cmp_to_key solution, are clearly *much* slower.

I may do another test with larger lists, and leave it running overnight 
and see what happens.


[code]

from random import random, shuffle
from operator import itemgetter
from timeit import Timer
from functools import cmp_to_key

def make_data(n):
    x = range(n)*2
    y = range(n)*2
    shuffle(x)
    shuffle(y)
    data = zip(x, y)
    assert len(data) == 2*n
    return data

# Sort according to the first item in ascending order, then by second item
# in descending order:

def cmp_func(t1, t2):
    return cmp(t1[0], t2[0]) or cmp(t2[1], t1[1])

def key_func(t):
    return (t[0], -t[1])

def sorter(d):
    d = sorted(d, key=itemgetter(1), reverse=True)
    return sorted(d, key=itemgetter(0))

# Check that all sort methods give the same result.
data = make_data(20)
assert sorted(data, cmp=cmp_func) == sorted(data, key=key_func) == \
    sorter(data) == sorted(data, key=cmp_to_key(cmp_func))


data = make_data(5000)
assert len(data) == 10000
setup = "from __main__ import cmp_to_key, cmp_func, key_func, sorter, 
data"

t1 = Timer("sorted(data, cmp=cmp_func)", setup)
t2 = Timer("sorted(data, key=key_func)", setup)
t3 = Timer("sorter(data)", setup)
t4 = Timer("sorted(data, key=cmp_to_key(cmp_func))", setup)

print "Comparison function:",
print min(t1.repeat(number=100, repeat=7))
print "Key function:",
print min(t2.repeat(number=100, repeat=7))
print "Double sort:",
print min(t3.repeat(number=100, repeat=7))
print "cmp_to_key:",
print min(t4.repeat(number=100, repeat=7))


[end code]



-- 
Steven



More information about the Python-list mailing list