[Tutor] Help for Python Beginner with extracting and manipulating data from thousands of ASCII files

Dave Angel d at davea.name
Mon Oct 1 02:16:17 CEST 2012


On 09/30/2012 06:07 PM, Cecilia Chavana-Bryant wrote:
> Hola again Python Tutor!
>
> With a friend's help I have the following code to extract reflectance data
> from an ASCII data file, do a bit of data manipulation to calibrate the
> data and then write the calibrated file into an out file.
>
> import numpy
> # import glob - use if list_of_files is used
>
>
> dataFile = "1SH0109.001.txt"
> #list_of_files = glob.glob('./*.txt') to replace dataFile to search for all
> text files in ASCII_files folder?

First, an important observation.  This code has no functions defined in
it.  Thus it's not reusable.  So every time you make a change, you'll be
breaking the existing code and then trying to make the new version work.

The decision of one file versus many is usually handled by writing a
function that deals with one file.  Test it with a single file.  Then
write another function that uses glob to build a list of files, and call
the original one in a loop.

As you work on it, you should discover that there are a half dozen other
functions that you need, rather than one big one.

> caliFile1 = "Cal_File_P17.txt" # calibration file to be used on data files
> created from July to 18 Sep
> caliFile2 = "Cal_File_P19.txt" # calibration file to be used on data files
> created from 19 Sep onwards
> outFile = "Cal_" + dataFile # this will need to change if list_of_files is
> used
> fileDate = data[6][16:26] # location of the creation date on the data files

Show us the full traceback from the runtime error you get on this line.

>
> #extract data from data file
> fdData = open(dataFile,"rt")
> data = fdData.readlines()
> fdData.close()
>
> #extract data from calibration file
> fdCal = open(caliFile,"rt")

Show us the full traceback from the runtime error here, as well.

> calibration = fdCal.readlines()
> fdCal.close()
>
> #create data table
> k=0 #just a counter
> dataNum = numpy.ndarray((2151,2))
>
> #the actual data (the numbers) in the data file begin at line 30
> for anItem in data[30:]:
>     theNums = anItem.replace("\r\n","").split("\t")
>     dataNum[k,0] = int(theNums[0])
>     dataNum[k,1] = float(theNums[1])
>     k+=1 #advance the counter
>
> #create the calibration table
> k = 0
> calNum = numpy.ndarray((2151,2))
> for anItem in calibration[5:]:
>     theNums = anItem.replace("\r\n","").split("\t")
>     calNum[k,0] = int(theNums[0])
>     calNum[k,1] = float(theNums[1])
>     k+=1
>
> #calibrate the data
> k=0
> calibratedData = numpy.ndarray((2151,2))
> for aNum in dataNum:
>     calibratedData[k,0] = aNum[0] #first column is the wavelength
>     calibratedData[k,1] = (aNum[1] * dataNum[k,1]) * 100.0 #second column
> is the measurement to be calibrated.
>     k+=1
>
> #write the calibrated data
> fd = open(outFile,"wt")
Error traceback ?
> #prior to writing the calibrated contents, write the headers for data files
> and calibration files
> fd.writelines(data[0:30])
> fd.writelines(calibration[0:5])
> for aNum in calibratedData:
>     #Write the data in the file in the following format:
>     # "An integer with 3 digits", "tab character", "Floating point number"
>     fd.write("%03d\t%f\n" % (aNum[0],aNum[1]))
>
> #close the file
> fd.close()
>

Are the individual files small?  By doing readlines() on them, you're
assuming you can hold all of both the data file and the calibration file
in memory.

> I have successfully calibrated one ASCII file at a time with this code.
Unless I'm missing something, this code does not run.  I didn't try it,
though, just inspected it quickly.
> However, I have 1,000s of files that I need to calibrate so I would like
> some help to modify this code so it can:
>
> 1. Use one calibration file (Cal_FileP17.txt) on data files created from
> July to the 18th Sep and a different calibration file (Cal_FileP19.txt) for
> data files created from the 19th of Sep onwards.
>
> 2. Find all the .txt files in a folder called ASCII_files, which is
> subdivided into 12 different folders and calibrate all these files
>
> I have googled and tried thinking about how to make changes and I've
> managed to get myself a bit more confused. Thus, I would like some guidance
> on how to tackle/think about this process and how to get started. Please, I
> am not asking for someone to do my work and write the code for me, I would
> like some guidance on how to approach this and get started.
>
> Many thanks in advance for your help,
> Cecilia
>
>


-- 

DaveA



More information about the Tutor mailing list