Complex sort on big files
__peter__ at web.de
Mon Aug 1 19:00:03 CEST 2011
> Apologies I'm sure this has been asked many times, but I'm trying to
> figure out the most efficient way to do a complex sort on very large
> I've read the recipe at  and understand that the way to sort a
> large file is to break it into chunks, sort each chunk and write
> sorted chunks to disk, then use heapq.merge to combine the chunks as
> you read them.
> What I'm having trouble figuring out is what to do when I want to sort
> by one key ascending then another key descending (a "complex sort").
> I understand that sorts are stable, so I could just repeat the whole
> sort process once for each key in turn, but that would involve going
> to and from disk once for each step in the sort, and I'm wondering if
> there is a better way.
> I also thought you could apply the complex sort to each chunk before
> writing it to disk, so each chunk was completely sorted, but then the
> heapq.merge wouldn't work properly, because afaik you can only give it
> one key.
You can make that key as complex as needed:
>>> class Key(object):
... def __init__(self, obj):
... self.asc = obj
... self.desc = obj
... def __cmp__(self, other):
... return cmp(self.asc, other.asc) or -cmp(self.desc,
>>> sorted(["abc", "aba", "bbb", "aaa", "aab"], key=Key)
['aab', 'aaa', 'abc', 'bbb', 'aba']
More information about the Python-list