[Tutor] Help for Python Beginner with extracting and manipulating data from thousands of ASCII files

Cecilia Chavana-Bryant cecilia.chavana at gmail.com
Mon Oct 1 00:07:08 CEST 2012

Hola again Python Tutor!

With a friend's help I have the following code to extract reflectance data
from an ASCII data file, do a bit of data manipulation to calibrate the
data and then write the calibrated file into an out file.

import numpy
# import glob - use if list_of_files is used

dataFile = "1SH0109.001.txt"
#list_of_files = glob.glob('./*.txt') to replace dataFile to search for all
text files in ASCII_files folder?
caliFile1 = "Cal_File_P17.txt" # calibration file to be used on data files
created from July to 18 Sep
caliFile2 = "Cal_File_P19.txt" # calibration file to be used on data files
created from 19 Sep onwards
outFile = "Cal_" + dataFile # this will need to change if list_of_files is
fileDate = data[6][16:26] # location of the creation date on the data files

#extract data from data file
fdData = open(dataFile,"rt")
data = fdData.readlines()

#extract data from calibration file
fdCal = open(caliFile,"rt")
calibration = fdCal.readlines()

#create data table
k=0 #just a counter
dataNum = numpy.ndarray((2151,2))

#the actual data (the numbers) in the data file begin at line 30
for anItem in data[30:]:
    theNums = anItem.replace("\r\n","").split("\t")
    dataNum[k,0] = int(theNums[0])
    dataNum[k,1] = float(theNums[1])
    k+=1 #advance the counter

#create the calibration table
k = 0
calNum = numpy.ndarray((2151,2))
for anItem in calibration[5:]:
    theNums = anItem.replace("\r\n","").split("\t")
    calNum[k,0] = int(theNums[0])
    calNum[k,1] = float(theNums[1])

#calibrate the data
calibratedData = numpy.ndarray((2151,2))
for aNum in dataNum:
    calibratedData[k,0] = aNum[0] #first column is the wavelength
    calibratedData[k,1] = (aNum[1] * dataNum[k,1]) * 100.0 #second column
is the measurement to be calibrated.

#write the calibrated data
fd = open(outFile,"wt")
#prior to writing the calibrated contents, write the headers for data files
and calibration files
for aNum in calibratedData:
    #Write the data in the file in the following format:
    # "An integer with 3 digits", "tab character", "Floating point number"
    fd.write("%03d\t%f\n" % (aNum[0],aNum[1]))

#close the file

I have successfully calibrated one ASCII file at a time with this code.
However, I have 1,000s of files that I need to calibrate so I would like
some help to modify this code so it can:

1. Use one calibration file (Cal_FileP17.txt) on data files created from
July to the 18th Sep and a different calibration file (Cal_FileP19.txt) for
data files created from the 19th of Sep onwards.

2. Find all the .txt files in a folder called ASCII_files, which is
subdivided into 12 different folders and calibrate all these files

I have googled and tried thinking about how to make changes and I've
managed to get myself a bit more confused. Thus, I would like some guidance
on how to tackle/think about this process and how to get started. Please, I
am not asking for someone to do my work and write the code for me, I would
like some guidance on how to approach this and get started.

Many thanks in advance for your help,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120930/82a46ea3/attachment.html>

More information about the Tutor mailing list