Sorting Large File (Code/Performance)
Thu Jan 24 20:41:46 CET 2008
Ira.Kovac at gmail.com writes:
> I have an Unicode text file with 1.6 billon lines (~2GB) that I'd like
> to sort based on first two characters.
> I'd greatly appreciate if someone can post sample code that can help
> me do this.
Use the unix sort command:
sort inputfile -o outputfile
I think there is a cygwin port.
> Also, any ideas on approximately how long is the sort process going to
> take (XP, Dual Core 2.0GHz w/2GB RAM).
Eh, unix sort would probably take a while, somewhere between 15
minutes and an hour. If you only have to do it once it's not worth
writing special purpose code. If you have to do it a lot, get some
more ram for that box, suck the file into memory and do a radix sort.
More information about the Python-list