[Tutor] More help needed in sorting files and strings

Roeland Rengelink r.b.rigilink@chello.nl
Sat, 27 Oct 2001 16:55:09 +0200


> tonycervone wrote:
> 
> Hi all.
> 
> I have been using this code to read weather reports and to print only
> the first two stations symbols in the file:
>        inp =open("Metar\Infometar.txt")
>        lines = inp.readlines()
>        Depart = lines[1].split()[0]
>        Destin = lines[4].split()[0]
>        Destin1 = lines[5].split()[0]
>        Print Depart+" "+Destin+" "+Destin1
> 
> This code  works fine until it finds an empty line and then I get an
> out of range error. Here's why:
> 
> 2001/10/23 21:18
> KPVD 232118Z AUTO 22014KT 3/4SM +TSRA BKN021 OVC033 20/14 A2937
>           RMK AO2
> 
> 2001/10/23 21:18
> KLGA 232118Z AUTO 23008KT 6SM -SHRA BKN012 18/17 A2962 RMK
>          AO2
> 
> 2001/10/23 21:18
> KBDL 232118Z AUTO 01012KT 6SM RA OVC050 14/05 A2965 RMK AO2


Hi Tony,

As far as I can tell, the layout of your data file follows the following
rules.

- The file contains one or more records 
- The first record starts at the first line of the file
- Each record contains the following
  o A line with a date and a time
  o One or more lines with station information
  o A blank line

Because you don't know beforehand if records contain one or two lines
of station data, you don't know at what line to look for data for the
second (or third, or n'th) weather station

Since you don't know at what line a given record is, you have to search
for the record using the previous rules. If you want to get the 2nd
station, you could do one of two things:

1. look for the 2nd line with a date and a time, the data you're
interested in is on the next line
2. look for the 2nd empty line, the data you're interested in is
somewhere along the lines before that.

In the code below I'm going to give a somewhat more general solution
First I'm going to assemble all data in a list of records. Each record
in this list will itself be a list of all data that the file contains
for a given record. I'm going to use the fact that records are separated
by blank lines to distinguish between subsequent records.

-- start script ------------------------------
inp = open('tmp.dat')
line_list = inp.readlines()

record_list = []
record = []

for line in line_list:
    line = line.strip()
    if line == '':
        # blank line encountered, add the current record to the  
        # record_list and start assembling a new record
        record_list.append(record)
        record = []
    else:
        # add all data items in this line to the current record
        record.extend(line.split())

#in case the file didn't end with a blank line
if len(record) > 0:
    record_list.append(record)

for record in record_list:
    print record

print record_list[0][2]
-- end script ---------------------------------

Using this file 'tmp.dat':

-- tmp.dat ------------------------------------
2001/10/23 21:18
KPVD 232118Z AUTO 22014KT 3/4SM +TSRA BKN021 OVC033 20/14 A2937
          RMK AO2

2001/10/23 21:18
KLGA 232118Z AUTO 23008KT 6SM -SHRA BKN012 18/17 A2962 RMK AO2

2001/10/23 21:18
KLGA 232118Z AUTO 23008KT 6SM -SHRA BKN012 18/17 A2962 
	RMK AO2
-- end tmp.dat --------------------------------

I get the following result:


['2001/10/23', '21:18', 'KPVD', '232118Z', 'AUTO']
['2001/10/23', '21:18', 'KLGA', '232118Z', 'AUTO']
['2001/10/23', '21:18', 'KLGA', '232118Z', 'AUTO']
KPVD

Note that

record_list[i][j] will give you the j'th data item of the i'th record,
counting from zero.

Hope this helps,

Roeland
-- 
r.b.rigilink@chello.nl

"Half of what I say is nonsense. Unfortunately I don't know which half"