Complex sort on big files
Peter Otten
__peter__ at web.de
Mon Aug 1 13:00:03 EDT 2011
aliman wrote:
> Apologies I'm sure this has been asked many times, but I'm trying to
> figure out the most efficient way to do a complex sort on very large
> files.
>
> I've read the recipe at [1] and understand that the way to sort a
> large file is to break it into chunks, sort each chunk and write
> sorted chunks to disk, then use heapq.merge to combine the chunks as
> you read them.
>
> What I'm having trouble figuring out is what to do when I want to sort
> by one key ascending then another key descending (a "complex sort").
>
> I understand that sorts are stable, so I could just repeat the whole
> sort process once for each key in turn, but that would involve going
> to and from disk once for each step in the sort, and I'm wondering if
> there is a better way.
>
> I also thought you could apply the complex sort to each chunk before
> writing it to disk, so each chunk was completely sorted, but then the
> heapq.merge wouldn't work properly, because afaik you can only give it
> one key.
You can make that key as complex as needed:
>>> class Key(object):
... def __init__(self, obj):
... self.asc = obj[1]
... self.desc = obj[2]
... def __cmp__(self, other):
... return cmp(self.asc, other.asc) or -cmp(self.desc,
other.desc)
...
>>> sorted(["abc", "aba", "bbb", "aaa", "aab"], key=Key)
['aab', 'aaa', 'abc', 'bbb', 'aba']
See also
http://docs.python.org/library/functools.html#functools.total_ordering
More information about the Python-list
mailing list