Converting a text data file from positional to tab delimited.
Lee Joramo
lee at joramo.com
Tue Mar 13 09:09:52 EST 2001
I am looking for suggestions to speed up the process of converting a large
text data file from 'positional' layout to tab delimited. The data file is
over 200MB in size containing over 40,000 lines which have over 600 fields.
I suspect that the 'for' loop that splits each line into tab delimited,
could be optimized. Perhaps it could be replaced with a regex or other
technique.
Thanks for any suggestions.
Lee Joramo
==================
#'layout' list elements [field name, start position, end position]
#For brievty, I have only included 10 fields.
#The contents of 'layout' are extracted from a text file that
#describes the layout of the datafile.
layout = [
['STUDY', 0, 7]
['MDLNO', 8, 12]
['DASH', 13, 16]
['INCENT', 17, 17]
['CODE1', 18, 18]
['CODE2', 19, 19]
['COVLET', 20, 20]
['VERSN', 21, 21]
['MAILNO', 22, 23]
['MLDYY', 24, 27]]
inFile = open('rawdata.dat', 'r')
outFile = open('delimted.dat', 'w')
while 1:
lines = inFile.readlines(1000)
if not lines: break
for line in lines:
delimitedLine = ""
delimit = ""
for field in layout:
#
#can this loop be improved??
#
fieldValue = line[field[1]:field[2]]
delimitedLine = delimitedLine + delimit + fieldValue
delimit = "\t"
outFile.write(delimitedLine+"\n")
inFile.close()
outFile.close()
del lines
More information about the Python-list
mailing list