Using re to get data from text file: SOLVED

Fri Sep 10 15:06:36 EDT 2004

Jocknerd <jocknerd1 at yahoo.com> wrote:
> >> 09/04/2004  Virginia              44   Temple               14
> >> 09/04/2004  LSU                   22   Oregon State         21
> >> 09/09/2004  Troy State            24   Missouri             14

> > Your format is a bit complicated since team's name can be variable
> > words.  But, I'm assuming that they don't have any digit as part of
> > their name.  So, use '\d+' to separate the fields.  Eg.
> >     re.split ('\d+', line)
> >     re.split ('(\d+)', line)
> >     re.split ('(\d+)', line[10:])
> 
> Couldn't figure out re.split.  Didn't seem to do what I wanted. Here's
> what did work:
> 
> #!/usr/bin/python
> 
> import re
> filename = sys.argv[1]
> file = open (filename, 'r')
> 
> schedule = []
> 
> pattern = re.compile(r'^(.*\D\d+\D\d+)\D(.*)\D(.*\d+)\D(.*)\D(.*\d+)(.*)$')
> while True:
>     line = file.readline()
>     if not line: break
>     g = {}
>     g['date'], g['team1'], g['score1'], g['team2'],
>     g['score2'],g['location'] = pattern.search(line).groups()
>     schedule.append(g)
> file.close()
> 
> for game in schedule:
>     print game['date'], game['team1'], game['score1'], game['team2'],
>     game['score2']

In Bash shell, this kind of cut/slicing is a bit easier.

    1.  line='09/09/2004 Troy State 24 Missouri 14'
	sscanf "$line" '%s %[^0-9] %[0-9] %[^0-9] %[0-9]' date team1 score1 team2 score2
	declare -p date team1 score1 team2 score2

    2.  line='09/09/2004 Troy State 24 Missouri 14'
	match "$line" '([0-9/]*) ([^0-9]*) ([0-9]*) ([^0-9]*) ([0-9]*)' a
	date=a[1]
	team1=a[2] score1=a[3]
	team2=a[4] score2=a[5]
	declare -p date team1 score1 team2 score2

Ref:
    http://freshmeat.net/projects/bashdiff/
    http://home.eol.ca/~parkw/index.html#bash
    help sscanf
    help match

-- 
William Park <opengeometry at yahoo.ca>
Open Geometry Consulting, Toronto, Canada