[Tutor] FW: files - strings - lists (fwd)
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Fri Dec 2 00:44:15 CET 2005
[Hi Paul; redirecting to tutor. Try reposting to tutor next time; we may
need to see if there's some mailing list issue there.]
---------- Forwarded message ----------
Date: Thu, 1 Dec 2005 16:42:32 -0600
From: Paul McGuire <paul at alanweberassociates.com>
To: smiles at worksmail.net, dyoo at hkn.eecs.berkeley.edu
Subject: FW: [Tutor] files - strings - lists
(Sorry for the direct e-mail, tutor-list rejected my post. -- Paul)
-----Original Message-----
From: Paul McGuire [mailto:paul at alanweberassociates.com]
Sent: Thursday, December 01, 2005 11:07 AM
To: 'tutor at python.org'
Subject: [Tutor] files - strings - lists
Chris, Danny, et al. -
Sorry I didn't chime in earlier on this thread. Here is a pyparsing sample
that goes beyond just tokenizing, and uses the structure of the input text
to organize the tokens into records with named attributes.
Enjoy!
-- Paul
# from earlier post
data = """1 Polonijna Liga Mistrzow
26 wrzesnia 2005
6 12 6 4 1
0 1 0
Bohossian - Kolinski
1
1.000 9 13 19
2.000 2 4 16
1.000 10 8 17
0.000 8 6 17
Szadkowska - Szczurek
2
0.000 11 16 20
3.000 1 -4 14
3.500 3 -7 13
2.500 10 13 19
"""
from pyparsing import *
real = Combine(Word(nums) + '.' + Word(nums)) integer =
Combine(Optional("-") + Word(nums))
# while we're parsing, might as well convert # integer strings to ints def
makeInt(st,loc,tokens): return int(tokens[0]) integer.setParseAction(
makeInt )
# we could get tricky and force names to start # only with capitals,
followed by all lower case, # but let's keep it simple name = Word(alphas)
# "26 wrzesnia 2005" looks suspiciously date-like date = integer + name +
integer
header = Group( integer + restOfLine + LineEnd() +
date + LineEnd() +
integer + integer + integer +
integer + integer + LineEnd() +
integer + integer + integer )
dataline = Group(real +
integer.setResultsName("start") +
integer.setResultsName("end") +
integer + LineEnd() )
entry = Group( name.setResultsName("fromName") +
"-" +
name.setResultsName("toName") +
LineEnd() +
integer.setResultsName("recnum") +
LineEnd() +
OneOrMore( dataline ).setResultsName("data") )
# define the overal grammar definition
grammar = header.setResultsName("header") + \
ZeroOrMore(entry).setResultsName("records")
# parse the input data string
results = grammar.parseString( data )
# print how many records found
print len(results.records)
# iterate over the returned records, and access # named data fields like
object attributes for rec in results.records:
print rec.recnum, rec.fromName, "->", rec.toName
for d in rec.data:
print "-",d.start,d.end
print
"""
Prints out:
2
1 Bohossian -> Kolinski
- 9 13
- 2 4
- 10 8
- 8 6
2 Szadkowska -> Szczurek
- 11 16
- 1 -4
- 3 -7
- 10 13
"""
More information about the Tutor
mailing list