Using re to get data from text file: SOLVED
William Park
opengeometry at yahoo.ca
Fri Sep 10 15:06:36 EDT 2004
Jocknerd <jocknerd1 at yahoo.com> wrote:
> >> 09/04/2004 Virginia 44 Temple 14
> >> 09/04/2004 LSU 22 Oregon State 21
> >> 09/09/2004 Troy State 24 Missouri 14
> > Your format is a bit complicated since team's name can be variable
> > words. But, I'm assuming that they don't have any digit as part of
> > their name. So, use '\d+' to separate the fields. Eg.
> > re.split ('\d+', line)
> > re.split ('(\d+)', line)
> > re.split ('(\d+)', line[10:])
>
> Couldn't figure out re.split. Didn't seem to do what I wanted. Here's
> what did work:
>
> #!/usr/bin/python
>
> import re
> filename = sys.argv[1]
> file = open (filename, 'r')
>
> schedule = []
>
> pattern = re.compile(r'^(.*\D\d+\D\d+)\D(.*)\D(.*\d+)\D(.*)\D(.*\d+)(.*)$')
> while True:
> line = file.readline()
> if not line: break
> g = {}
> g['date'], g['team1'], g['score1'], g['team2'],
> g['score2'],g['location'] = pattern.search(line).groups()
> schedule.append(g)
> file.close()
>
> for game in schedule:
> print game['date'], game['team1'], game['score1'], game['team2'],
> game['score2']
In Bash shell, this kind of cut/slicing is a bit easier.
1. line='09/09/2004 Troy State 24 Missouri 14'
sscanf "$line" '%s %[^0-9] %[0-9] %[^0-9] %[0-9]' date team1 score1 team2 score2
declare -p date team1 score1 team2 score2
2. line='09/09/2004 Troy State 24 Missouri 14'
match "$line" '([0-9/]*) ([^0-9]*) ([0-9]*) ([^0-9]*) ([0-9]*)' a
date=a[1]
team1=a[2] score1=a[3]
team2=a[4] score2=a[5]
declare -p date team1 score1 team2 score2
Ref:
http://freshmeat.net/projects/bashdiff/
http://home.eol.ca/~parkw/index.html#bash
help sscanf
help match
--
William Park <opengeometry at yahoo.ca>
Open Geometry Consulting, Toronto, Canada
More information about the Python-list
mailing list