Pyparsing: Grammar Suggestion
Heiko Wundram
me+python at modelnine.org
Wed May 17 11:53:17 EDT 2006
Am Mittwoch 17 Mai 2006 17:24 schrieb Khoa Nguyen:
> Any suggestions?
If you're not limited to PyParsing, pyrr.ltk/ptk might be appropriate for you
here (if you're used to bison/flex). The following file implements a small
sample lexer/parser which does exactly what you need. pyrr.ltk (the lexing
toolkit) is stable, but pyrr.ptk isn't yet, but it's nevertheless available
under:
http://hg.modelnine.org/hg/pyrr
as a mercurial repository. I'd advise you to take the version from the
repository, if you're interested in it, as my packaged versions always had
quirks, which the current head of the repository doesn't, AFAICT.
Anyway, the following implements the parser/lexer for you:
>>>
from pyrr.ltk import LexerBase, IgnoreMatch
from pyrr.ptk import ParserBase
class SampleLexer(LexerBase):
def f(self,match,data):
r"""
f1 [10]-> /f1/
f2 [10]-> /f2/
f3 [10]-> /f3/
f4 [10]-> /f4/
f5 [10]-> /f5/
f6 [10]-> /f6/
Create your specific matches for each of the fs here...
"""
return data
def fid(self,match,data):
r"""
fid -> ri/[a-z_][a-z0-9_]*/
Match a record identifier.
"""
return data
def end_of_record(self,match,data):
r"""
EOR -> /END_OF_RECORD/
Your end of record marker...
"""
def operators(self,match,data):
r"""
nl -> e/\n/
c -> /,/
eq -> /=/
Newline is something that I have inserted here...
"""
def ws(self,match,data):
r"""
ws -> r/\s+/
Ignore all whitespace that occurs somewhere in the input.
"""
raise IgnoreMatch
class SampleParser(ParserBase):
__start__ = "ifile"
def ifile(self,data):
"""
ifile -> record+
"""
return dict(data)
def record(self,fid,eq,f1,c1,f2,c2,f3,c3,f4,c4,f5,c5,f6,eor,nl):
"""
record -> /fid/ /eq/ /f1/? /c/ /f2/? /c/ /f3/? /c/ /f4/? /c/ /f5/? /c/ /f6/? /EOR/ /nl/
"""
return (fid,(f1,f2,f3,f4,f5,f6))
data = r"""recmark = f1,f2,,f4,f5,f6 END_OF_RECORD
recmark2 = f1,f2,f3,f4,,f6 END_OF_RECORD
"""
print SampleParser.parse(SampleLexer(data))
>>>
HTH!
--- Heiko.
More information about the Python-list
mailing list