[Tutor] Help for Python Beginner with extracting and manipulating data from thousands of ASCII files

Oscar Benjamin oscar.j.benjamin at gmail.com
Mon Oct 1 12:02:16 CEST 2012


On Sep 30, 2012 11:10 PM, "Cecilia Chavana-Bryant"
<cecilia.chavana at gmail.com> wrote:
>
> Hola again Python Tutor!

Hi Cecilia

>
> With a friend's help I have the following code to extract reflectance data from an ASCII data file, do a bit of data manipulation to calibrate the data and then write the calibrated file into an out file.
>
> import numpy
> # import glob - use if list_of_files is used
>
>
> dataFile = "1SH0109.001.txt"
> #list_of_files = glob.glob('./*.txt') to replace dataFile to search for all text files in ASCII_files folder?
> caliFile1 = "Cal_File_P17.txt" # calibration file to be used on data files created from July to 18 Sep
> caliFile2 = "Cal_File_P19.txt" # calibration file to be used on data files created from 19 Sep onwards
> outFile = "Cal_" + dataFile # this will need to change if list_of_files is used
> fileDate = data[6][16:26] # location of the creation date on the data files

The variable data used in the line above is not created until the
lines below run. I think you need to move this line down. What format
does fileDate have? I guess it's a string of text from the file. If
you can convert it to a datetime (or date) object it will be easy to
compare with the dates as required for your calibration file. Can you
show us how it looks e.g.

'12-Nov-2012'
or
'12/11/12'
or something else?

>
> #extract data from data file
> fdData = open(dataFile,"rt")
> data = fdData.readlines()
> fdData.close()


Python has a slightly better way of writing code like this:

with open(dataFile, 'rt') as fdata:
    data = fdata.readlines()

This way you don't need to remember to close the file. In fact Python
will even remember to close it if there is an error.


>
> #extract data from calibration file
> fdCal = open(caliFile,"rt")
> calibration = fdCal.readlines()
> fdCal.close()

Where is caliFile set? If your going to load all the data files you
might as well load both calibration files here at the beginning.

>
> #create data table
> k=0 #just a counter
> dataNum = numpy.ndarray((2151,2))

Does dataNum store integers or floating point numbers? Numpy won't let
you do both in the same array. You should always specify the type of
the numpy array that you want to create:

dataNum = numpy.ndarray((2152, 2), float)

or

dataNum = numpy.ndarray((2152, 2), int)

As it happens you are creating an array floats. That means that when
you try to store an integer in the array below it gets converted to a
float.

>
> #the actual data (the numbers) in the data file begin at line 30
> for anItem in data[30:]:
>     theNums = anItem.replace("\r\n","").split("\t")
>     dataNum[k,0] = int(theNums[0])
>     dataNum[k,1] = float(theNums[1])
>     k+=1 #advance the counter

You should look into using numpy.fromfile. This function is
specifically designed for this purpose.

For example:

with open(dataFile) as fdata:
    header_lines = [fdata.readline() for _ in range(30)]
    dataNum = numpy.fromfile(fdata, float, sep='\t')


Oscar


More information about the Tutor mailing list