sorting 1172026 entries
Cameron Simpson
cs at zip.com.au
Sun May 6 19:54:16 EDT 2012
On 06May2012 18:36, J. Mwebaze <jmwebaze at gmail.com> wrote:
| > for filename in txtfiles:
| > temp=[]
| > f=open(filename)
| > for line in f.readlines():
| > line = line.strip()
| > line=line.split()
| > temp.append((parser.parse(line[0]), float(line[1])))
Have you timed the different parts of your code instead of the whole
thing?
Specificly, do you know the sort time is the large cost?
I would point out that the loop above builds the list by append(), one
item at a time. That should have runtime cost of the square of the list
length, 1172026 * 1172026. Though I've just done this:
[Documents/python]oscar1*> python
Python 2.7.3 (default, May 4 2012, 16:19:02)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> L1 = []
>>> for i in range(1000000): L1.append(0)
...
and it only took a few seconds.
As pointed out by others, the readlines() is also a little expensive,
conceivably similarly so (it also needs to build a huge list).
Anyway, put some:
print time.time()
at various points. Not in the inner bits of the loops, but around larger
chunks, example:
from time import time
temp=[]
f=open(filename)
print "after open", time()
lines = f.readlines()
print "after readlines", time()
for line in lines:
line = line.strip()
line=line.split()
temp.append((parser.parse(line[0]), float(line[1])))
print "after read loop", time()
and so on. AT least then you will have more feel for what part of your
code is taking so long.
Ceers,
--
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/
The shortest path between any two truths in the real domain passes through
the complex domain. - J. Hadamand
More information about the Python-list
mailing list