[Tutor] sorting a list of dictionaries

Kent Johnson kent37 at tds.net
Thu Dec 9 21:22:29 CET 2004


Using sort() with a user compare function is not recommended when you care about performance. The 
problem is that the sort function has to call back into Python code for every compare, of which 
there are many. The decorate - sort - undecorate idiom is the preferred way to do this in Python < 
2.4. Python 2.4 adds the key= parameter to sort() and does the DSU internally.

Ironically, the best way to speed up Python code is often to eliminate it - the more work you can do 
in the C runtime, the faster your code will run. Sometimes an approach that seems to do more work - 
like DSU, which builds two extra lists - is actually faster because it does more in C code.

Here is a program to time the three methods. On my machine, the ratio of times of sort1 : sort2 : 
sort3 is pretty consistently about 1 : 2 : 6-7, i.e. sort with key parameter is twice as fast as DSU 
which is three times faster than sort with cmp.

A couple of interesting observations:
- The ratio of sort times is independent of the length of the list. I expected that the performance 
of sort3 would get progressively worse as the list got longer because I expect that the cmp function 
would be called a number of times that is O(n*logn). Apparently this is not the case.

- If the list is large and already sorted, sort3 is significantly faster than sort2 - it takes about 
2/3 the time of sort2. You can try this by taking out the call to random.shuffle().

Kent


import operator, random, timeit

a = range(10000)
random.shuffle(a)
dlist = [ {'name':str(i), 'size':i} for i in a ]


def sort1(ds):
     ''' Sort using Python 2.4 key argument '''
     ds.sort(key=operator.itemgetter('size'))


def sort2(ds):
     ''' Sort using DSU '''
     d2 = [ (d['size'], d) for d in ds ]
     d2.sort()
     ds[:] = [ d for (name, d) in d2 ]

def sort3(ds):
     ''' Sort using cmp argument to sort '''
     ds.sort(lambda m, n: cmp(m['size'], n['size']))


def timeOne(fn):
     setup = "from __main__ import " + fn.__name__ + ',dlist'
     stmt = fn.__name__ + '(dlist[:])'

     t = timeit.Timer(stmt, setup)
     secs = min(t.repeat(3, 10))
     print '%s: %f secs' % (fn.__name__, secs)


# Make sure they all give the same answer
a1=dlist[:]
sort1(a1)

a2=dlist[:]
sort2(a2)

a3=dlist[:]
sort3(a3)


assert a1 == a2
assert a3 == a2

timeOne(sort1)
timeOne(sort2)
timeOne(sort3)


Karl Pflästerer wrote:
> On  9 Dez 2004, ljholish at speakeasy.net wrote:
> 
> 
>>I have a list of dictionaries, each representing info about a file,
>>something like:
>>
>>[{'name':'foo.txt','size':35}, {'name':'bar.txt','size':35}, ...]
>>
>>I want to present a sorted list of all the files' data, sorting on the
>>keys 'name' or 'size'. The file 'name' s should be unique (I'm hoping)
>>across all the dictionaries. Can someone point me towards an efficient
>>solution for accomplishing the sort? (The list has 1000s of files).
> 
> 
> That's easy to achieve, since sort takes a custom sort function as
> optional argument.  Now you need only a function which takes the values
> of the fileds and compares them.
> 
> E.g.
> 
> lst.sort(lambda m, n: cmp(m.get(field), n.get(field)))
> where field is either 'name' or 'size'.
> 
> As a function:
> 
> def sort_it (lst, field):
>     lst.sort(lambda m, n: cmp(m.get(field), n.get(field)))
> 
> 
>    Karl


More information about the Tutor mailing list