Complex sort on big files
nagle at animats.com
Wed Aug 10 00:20:48 CEST 2011
On 8/6/2011 10:53 AM, sturlamolden wrote:
> On Aug 1, 5:33 pm, aliman<aliman... at googlemail.com> wrote:
>> I've read the recipe at  and understand that the way to sort a
>> large file is to break it into chunks, sort each chunk and write
>> sorted chunks to disk, then use heapq.merge to combine the chunks as
>> you read them.
> Or just memory map the file (mmap.mmap) and do an inline .sort() on
> the bytearray (Python 3.2). With Python 2.7, use e.g. numpy.memmap
> instead. If the file is large, use 64-bit Python. You don't have to
> process the file in chunks as the operating system will take care of
> those details.
No, no, no. If the file is too big to fit in memory, trying to
page it will just cause thrashing as the file pages in and out from
The UNIX sort program is probably good enough. There are better
approaches, if you have many gigabytes to sort, (see Syncsort, which
is a commercial product) but few people need them.
More information about the Python-list