Program very slow to finish

Fred fredNo at nospamco.com
Sat Nov 3 11:35:10 EST 2001


I'm dealing with about 100Gb of data that I first just need to
characterize.  So, since the slowest part will be simply reading the
data, I'm testing various languages and methods on a 10 line subset, a
32 Mb subset and a 750 Mb subset.

The following python program prints out the results in about 30 seconds,
however it doesn't finish for another minute with the 32 Mb set of
data!  At first I thought it was stuck and killed it; however I finally
let it run on the smaller data set and all was well.

Is this a garbage collection issue?  Is there a better way to count the
individual values than dictionaries?  I put the sys.exit call in while
trying to figure out what was happening but it didn't make a difference.

Python 2.1.1 (#3, Oct 25 2001, 12:54:40) [C] on osf1V4

Thanks

===================================

import sys

nb=0
ns=0
valb=0l
vals=0l
minb=9999999
maxb=0
mins=9999999
maxs=0
nacc={}
ntra={}
nstk={}

icount = 0
for line in sys.stdin.xreadlines(): # Might speed things up
    icount += 1
    mthdate=line[0:8]
    b_s=line[45]
    mthamt=int(line[81:93])
    stkno=line[8:14]
    ivacno=line[18:25]
    idno=line[25:35]
    mthamt=int(line[81:93])
    stkno=line[8:14]
    ivacno=line[18:25]
    idno=line[25:35]
    if b_s == 'B':
       nb = nb+1
       valb = valb+mthamt
       minb=min(minb,mthamt)
       maxb=max(maxb,mthamt)
    elif b_s == 'S':
       ns = ns+1
       vals = vals+mthamt
       mins=min(mins,mthamt)
       maxs=max(maxs,mthamt)
    nacc[ivacno]=None
    ntra[idno]=None
    nstk[stkno]=None
print "Total lines read: " + `icount`
print "B: " + `nb` + " Value: " + `valb` + " Max/Min: " + `maxb` + " " +
`minb`
print "S: " + `ns` + " Value: " + `vals` + " Max/Min: " + `maxs` + " " +
`mins`
print "Number of acc: " + `len(nacc.keys())`
print "Number of tra: " + `len(ntra.keys())`
print "Number of stk: " + `len(nstk.keys())`
sys.exit()


$ time python unitest.py < 841123.dat
Total lines read: 316480
Buys: 158229 Value: 20191822207L Max/Min: 9101400 12
Sells: 158251 Value: 20193690457L Max/Min: 9101400 12
Number of acc: 56568
Number of tra: 64704
Number of stk: 388

real    1m30.48s
user    1m21.13s
sys     0m2.28s





More information about the Python-list mailing list