Complex sort on big files
roy at panix.com
Sat Aug 6 04:54:05 CEST 2011
I was going to suggest using the unix command-line sort utility via
popen() or subprocess. My arguments were that it's written in C, has 30
years of optimizing in it, etc, etc, etc. It almost certainly has to be
faster than anything you could do in Python.
Then I tried the experiment. I generated a file of 1 million random
integers in the range 0 to 5000. I wrote a little sorting program:
numbers = [int(line) for line in open('numbers')]
for i in numbers:
and ran it on my MacBook Pro (8 Gig, 2 x 2.4 GHz cores), Python 2.6.1.
$ time ./sort.py > py-sort
and did the same with the unix utility:
$ time sort -n numbers > cli-sort
Python took just about half the time. Certainly knocked my socks off.
Hard to believe, actually.
More information about the Python-list