Sorting Large File (Code/Performance)

Ira.Kovac at gmail.com Ira.Kovac at gmail.com
Thu Jan 24 16:26:47 EST 2008


Thanks to all who replied. It's very appreciated.

Yes, I had to doublecheck line counts and the number of lines is ~16
million (insetead of stated 1.6B).

Also:

>What is a "Unicode text file"? How is it encoded: utf8, utf16, utf16le, utf16be, ??? If you don't know, do this:
The file is UTF-8

> Do the first two characters always belong to the ASCII subset?
Yes, first two always belong to ASCII subset

> What are you going to do with it after it's sorted?
I need to isolate all lines that start with two characters (zz to be
particular)

> Here's a start: http://docs.python.org/lib/typesseq-mutable.html
> Google "GnuWin32" and see if their sort does what you want.
Will do, thanks for the tip.

> If you really have a 2GB file and only 2GB of RAM, I suggest that you don't hold your breath.
I am limited with resources. Unfortunately.

Cheers,

Ira



More information about the Python-list mailing list