[Tutor] Help for Python Beginner with extracting and manipulating data from thousands of ASCII files

Cecilia Chavana-Bryant cecilia.chavana at gmail.com
Mon Oct 1 18:05:56 CEST 2012


On Mon, Oct 1, 2012 at 1:16 AM, Dave Angel <d at davea.name> wrote:

> On 09/30/2012 06:07 PM, Cecilia Chavana-Bryant wrote:
> > Hola again Python Tutor!
> >
> > With a friend's help I have the following code to extract reflectance
> data
> > from an ASCII data file, do a bit of data manipulation to calibrate the
> > data and then write the calibrated file into an out file.
> >
> > import numpy
> > # import glob - use if list_of_files is used
> >
> >
> > dataFile = "1SH0109.001.txt"
> > #list_of_files = glob.glob('./*.txt') to replace dataFile to search for
> all
> > text files in ASCII_files folder?
>
> First, an important observation.  This code has no functions defined in
> it.  Thus it's not reusable.  So every time you make a change, you'll be
> breaking the existing code and then trying to make the new version work.
>
> The decision of one file versus many is usually handled by writing a
> function that deals with one file.  Test it with a single file.  Then
> write another function that uses glob to build a list of files, and call
> the original one in a loop.
>
> As you work on it, you should discover that there are a half dozen other
> functions that you need, rather than one big one.
>
> Many thanks for this advise this helps me to get started with trying to
write functions for the different procedures and then think about many
files.

> > caliFile1 = "Cal_File_P17.txt" # calibration file to be used on data
> files
> > created from July to 18 Sep
> > caliFile2 = "Cal_File_P19.txt" # calibration file to be used on data
> files
> > created from 19 Sep onwards
> > outFile = "Cal_" + dataFile # this will need to change if list_of_files
> is
> > used
> > fileDate = data[6][16:26] # location of the creation date on the data
> files
>
> Show us the full traceback from the runtime error you get on this line.
>
> The option of using 2 different calibration files is an idea that I
haven't tested yet as I am a bit lost in how to do this. I have gotten as
far as adding start and end dates on both calibration files as part of the
header information for each file.

#extract data from data file

> > fdData = open(dataFile,"rt")
> > data = fdData.readlines()
> > fdData.close()
> >
> > #extract data from calibration file
> > fdCal = open(caliFile,"rt")
>
> Show us the full traceback from the runtime error here, as well.
>
> In the original code which uses only one calibration file this and the
rest of the code works without error.


> > calibration = fdCal.readlines()
> > fdCal.close()
> >
> > #create data table
> > k=0 #just a counter
> > dataNum = numpy.ndarray((2151,2))
> >
> > #the actual data (the numbers) in the data file begin at line 30
> > for anItem in data[30:]:
> >     theNums = anItem.replace("\r\n","").split("\t")
> >     dataNum[k,0] = int(theNums[0])
> >     dataNum[k,1] = float(theNums[1])
> >     k+=1 #advance the counter
> >
> > #create the calibration table
> > k = 0
> > calNum = numpy.ndarray((2151,2))
> > for anItem in calibration[5:]:
> >     theNums = anItem.replace("\r\n","").split("\t")
> >     calNum[k,0] = int(theNums[0])
> >     calNum[k,1] = float(theNums[1])
> >     k+=1
> >
> > #calibrate the data
> > k=0
> > calibratedData = numpy.ndarray((2151,2))
> > for aNum in dataNum:
> >     calibratedData[k,0] = aNum[0] #first column is the wavelength
> >     calibratedData[k,1] = (aNum[1] * dataNum[k,1]) * 100.0 #second column
> > is the measurement to be calibrated.
> >     k+=1
> >
> > #write the calibrated data
> > fd = open(outFile,"wt")
> Error traceback ?
> > #prior to writing the calibrated contents, write the headers for data
> files
> > and calibration files
> > fd.writelines(data[0:30])
> > fd.writelines(calibration[0:5])
> > for aNum in calibratedData:
> >     #Write the data in the file in the following format:
> >     # "An integer with 3 digits", "tab character", "Floating point
> number"
> >     fd.write("%03d\t%f\n" % (aNum[0],aNum[1]))
> >
> > #close the file
> > fd.close()
> >
>
> Are the individual files small?  By doing readlines() on them, you're
> assuming you can hold all of both the data file and the calibration file
> in memory.
>
> Both the calibration and the data files are small. The original excel
calibration files have been saved as "Tab delimited text files" and the
data files are ASCII files with 2151 rows and 2 columns.

> I have successfully calibrated one ASCII file at a time with this code.
> Unless I'm missing something, this code does not run.  I didn't try it,
> though, just inspected it quickly.
> > However, I have 1,000s of files that I need to calibrate so I would like
> > some help to modify this code so it can:
> >
> > 1. Use one calibration file (Cal_FileP17.txt) on data files created from
> > July to the 18th Sep and a different calibration file (Cal_FileP19.txt)
> for
> > data files created from the 19th of Sep onwards.
> >
> > 2. Find all the .txt files in a folder called ASCII_files, which is
> > subdivided into 12 different folders and calibrate all these files
> >
> > I have googled and tried thinking about how to make changes and I've
> > managed to get myself a bit more confused. Thus, I would like some
> guidance
> > on how to tackle/think about this process and how to get started.
> Please, I
> > am not asking for someone to do my work and write the code for me, I
> would
> > like some guidance on how to approach this and get started.
> >
> > Many thanks in advance for your help,
> > Cecilia
> >
> >
>
>
> --
>
> DaveA
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20121001/13e9b2a6/attachment-0001.html>


More information about the Tutor mailing list