Working with Huge Text Files

Lorn Davies efoda at hotmail.com
Sat Mar 19 12:30:10 EST 2005


Thank you all very much for your suggestions and input... they've been
very helpful. I found the easiest apporach, as a beginner to this, was
working with Chirag's code. Thanks Chirag, I was actually able to read
and make some edit's to the code and then use it... woohooo!

My changes are annotated with ##:

data_file = open('G:\pythonRead.txt', 'r')
data_file.readline()  ## this was to skip the first line
months = {'JAN':'01', 'FEB':'02', 'MAR':'03', 'APR':'04', 'MAY':'05',
'JUN':'06', 'JUL':'07', 'AUG':'08', 'SEP':'09', 'OCT':'10', 'NOV':'11',
'DEC':'12'}
output_files = {}
for line in data_file:
    fields = line.strip().split(',')
    length = len(fields[3])  ## check how long the field is
    N = 'P','N'
    filename = fields[0]
    if filename not in output_files:
        output_files[filename] = open(filename+'.txt', 'w')
    if  (fields[8] == 'N' or 'P') and (fields[6] == '0' or '1'):
   ## This line above doesn't work, can't figure out how to struct?
       fields[1] = fields[1][5:] + months[fields[1][2:5]] +
fields[1][:2]
        fields[2] = fields[2].replace(':', '')
        if length == 6:    ## check for 6 if not add a 0
            fields[3] = fields[3].replace('.', '')
        else:
            fields[3] = fields[3].replace('.', '') + '0'
        print >>output_files[filename], ', '.join(fields[1:5])
for filename in output_files:
    output_files[filename].close()
data_file.close()

The main changes were to create a check for the length of fields[3], I
wanted to normalize it at 6 digits... the problem I can seee with it
potentially is if I come across lengths < 5, but I have some ideas to
fix that. The other change I attempted was a criteria for what to print
based on the value of fields[8] and fields[6]. It didn't work so well.
I'm a little confused at how to structure booleans like that... I come
from a little experience in a Pascal type scripting language where "x
and y" would entail both having to be true before continuing and "x or
y" would mean either could be true before continuing. Python, unless
I'm misunderstanding (very possible), doesn't organize it as such. I
thought of perhaps using a set of if, elif, else statements for
processing the fileds, but didn't think that would be the most
elegant/efficient solution.

Anyway, any critiques/ideas are welcome... they'll most definitely help
me understand this language a bit better. Thank you all again for your
great replies and thank you Chirag for getting me up and going.

Lorn




More information about the Python-list mailing list