Complex sort on big files

John Nagle nagle at animats.com
Tue Aug 9 18:20:48 EDT 2011


On 8/6/2011 10:53 AM, sturlamolden wrote:
> On Aug 1, 5:33 pm, aliman<aliman... at googlemail.com>  wrote:
>
>> I've read the recipe at [1] and understand that the way to sort a
>> large file is to break it into chunks, sort each chunk and write
>> sorted chunks to disk, then use heapq.merge to combine the chunks as
>> you read them.
>
> Or just memory map the file (mmap.mmap) and do an inline .sort() on
> the bytearray (Python 3.2). With Python 2.7, use e.g. numpy.memmap
> instead. If the file is large, use 64-bit Python. You don't have to
> process the file in chunks as the operating system will take care of
> those details.
>
> Sturla

    No, no, no.  If the file is too big to fit in memory, trying to
page it will just cause thrashing as the file pages in and out from
disk.

    The UNIX sort program is probably good enough.  There are better
approaches, if you have many gigabytes to sort, (see Syncsort, which
is a commercial product) but few people need them.

				John Nagle




More information about the Python-list mailing list