fast text processing
Ben Sizer
kylotan at gmail.com
Tue Feb 21 04:36:45 EST 2006
Maybe this code will be faster? (If it even does the same thing:
largely untested)
filehandle = open("data",'r',buffering=1000)
fileIter = iter(filehandle)
lastLine = fileIter.next()
lastTokens = lastLine.strip().split(delimiter)
lastGeno = extract(lastTokens[0])
for currentLine in fileIter:
currentTokens = currentLine.strip().split(delimiter)
currentGeno = extract(currentTokens[0])
if lastGeno == currentGeno:
table.markEquivalent(int(lastTokens[1]),int(currentTokens[1]))
# prepare for next iteration
lastLine = currentLine
lastTokens = currentTokens
lastGeno = currentGeno
I'd be tempted to try a bigger file buffer too, personally.
--
Ben Sizer
More information about the Python-list
mailing list