Program very slow to finish
Fred
fredNo at nospamco.com
Sat Nov 3 11:35:10 EST 2001
I'm dealing with about 100Gb of data that I first just need to
characterize. So, since the slowest part will be simply reading the
data, I'm testing various languages and methods on a 10 line subset, a
32 Mb subset and a 750 Mb subset.
The following python program prints out the results in about 30 seconds,
however it doesn't finish for another minute with the 32 Mb set of
data! At first I thought it was stuck and killed it; however I finally
let it run on the smaller data set and all was well.
Is this a garbage collection issue? Is there a better way to count the
individual values than dictionaries? I put the sys.exit call in while
trying to figure out what was happening but it didn't make a difference.
Python 2.1.1 (#3, Oct 25 2001, 12:54:40) [C] on osf1V4
Thanks
===================================
import sys
nb=0
ns=0
valb=0l
vals=0l
minb=9999999
maxb=0
mins=9999999
maxs=0
nacc={}
ntra={}
nstk={}
icount = 0
for line in sys.stdin.xreadlines(): # Might speed things up
icount += 1
mthdate=line[0:8]
b_s=line[45]
mthamt=int(line[81:93])
stkno=line[8:14]
ivacno=line[18:25]
idno=line[25:35]
mthamt=int(line[81:93])
stkno=line[8:14]
ivacno=line[18:25]
idno=line[25:35]
if b_s == 'B':
nb = nb+1
valb = valb+mthamt
minb=min(minb,mthamt)
maxb=max(maxb,mthamt)
elif b_s == 'S':
ns = ns+1
vals = vals+mthamt
mins=min(mins,mthamt)
maxs=max(maxs,mthamt)
nacc[ivacno]=None
ntra[idno]=None
nstk[stkno]=None
print "Total lines read: " + `icount`
print "B: " + `nb` + " Value: " + `valb` + " Max/Min: " + `maxb` + " " +
`minb`
print "S: " + `ns` + " Value: " + `vals` + " Max/Min: " + `maxs` + " " +
`mins`
print "Number of acc: " + `len(nacc.keys())`
print "Number of tra: " + `len(ntra.keys())`
print "Number of stk: " + `len(nstk.keys())`
sys.exit()
$ time python unitest.py < 841123.dat
Total lines read: 316480
Buys: 158229 Value: 20191822207L Max/Min: 9101400 12
Sells: 158251 Value: 20193690457L Max/Min: 9101400 12
Number of acc: 56568
Number of tra: 64704
Number of stk: 388
real 1m30.48s
user 1m21.13s
sys 0m2.28s
More information about the Python-list
mailing list