Complex sort on big files
Roy Smith
roy at panix.com
Fri Aug 5 22:54:05 EDT 2011
Wow.
I was going to suggest using the unix command-line sort utility via
popen() or subprocess. My arguments were that it's written in C, has 30
years of optimizing in it, etc, etc, etc. It almost certainly has to be
faster than anything you could do in Python.
Then I tried the experiment. I generated a file of 1 million random
integers in the range 0 to 5000. I wrote a little sorting program:
numbers = [int(line) for line in open('numbers')]
numbers.sort()
for i in numbers:
print i
and ran it on my MacBook Pro (8 Gig, 2 x 2.4 GHz cores), Python 2.6.1.
$ time ./sort.py > py-sort
real 0m2.706s
user 0m2.491s
sys 0m0.057s
and did the same with the unix utility:
$ time sort -n numbers > cli-sort
real 0m5.123s
user 0m4.745s
sys 0m0.063s
Python took just about half the time. Certainly knocked my socks off.
Hard to believe, actually.
More information about the Python-list
mailing list