Pretty Scheme, ??? Python
Paul McGuire
ptmcg at austin.rr.com
Mon Jul 2 19:28:58 EDT 2007
On Jul 2, 3:56 pm, Neil Cerutti <horp... at yahoo.com> wrote:
> On 2007-07-02, Laurent Pointal <laurent.poin... at wanadoo.fr> wrote:
>
> > Neil Cerutti wrote:
> >> How can I make the Python more idiomatic Python?
>
> > Have you taken a look at pyparsing ?
>
> Yes, I have it. PyParsing has, well, so many convenience features
> they seem to shout down whatever the core features are, and I
> don't know quite how to get started as a result.
>
> Hardest of all was modifying a working PyParsing program.
>
> As a result, I've found writing my own recursive descent parsers
> much easier.
>
> I'm probably wrong, though. ;)
>
> --
> Neil Cerutti
from pyparsing import *
"""
Neil -
Ok, here is the step-by-step, beginning with your posted BNF. (Based
on your test cases, I think the '{}'s are really supposed to be
'()'s.)
; <WAE> ::=
; <num>
; | { + <WAE> <WAE> }
; | { - <WAE> <WAE> }
; | {with {<id> <WAE>} <WAE>}
; | <id>
The most basic building blocks in pyparsing are Literal and Word.
With these, you compose "compound" elements using And and MatchFirst,
which are bound to the operators '+' and '|' (on occasion, Or is
required, bound to operator '^', but not for this simple parser).
Since you have a recursive grammar, you will also need Forward.
Whitespace is skipped implicitly.
Only slightly more advanced is the Group class, which will impart
hierarchy and structure to the results - otherwise, everything just
comes out as one flat list of tokens. You may be able to remove these
in the final parser, depending on your results after steps 1 and 2 in
the "left for the student" part below, but they are here to help show
structure of the parsed tokens.
As convenience functions go, I think the most common are oneOf and
delimitedList. oneOf might be useful here if you want to express id
as a single-char variable; otherwise, just use Word(alphas).
At this point you should be able to write a parser for this WAE
grammar. Like the following 9-liner:
"""
LPAR = Literal("(").suppress()
RPAR = Literal(")").suppress()
wae = Forward()
num = Word(nums)
id = oneOf( list(alphas) )
addwae = Group( LPAR + "+" + wae + wae + RPAR )
subwae = Group( LPAR + "-" + wae + wae + RPAR )
withwae = Group( LPAR + "with" + LPAR + id + wae + RPAR + wae + RPAR )
wae << (addwae | subwae | withwae | num | id)
tests = """\
3
(+ 3 4)
(with (x (+ 5 5)) (+ x x))""".splitlines()
for t in tests:
print t
waeTree = wae.parseString(t)
print waeTree.asList()
print
"""
If you extract and run this script, here are the results:
3
['3']
(+ 3 4)
[['+', '3', '4']]
(with (x (+ 5 5)) (+ x x))
[['with', 'x', ['+', '5', '5'], ['+', 'x', 'x']]]
Left as an exercise for the student:
1. Define classes NumWAE, IdWAE, AddWAE, SubWAE, and WithWAE whose
__init__ methods take a ParseResults object named tokens (which you
can treat as a list of tokens), and each with a calc() method to
evaluate them accordingly.
2. Hook each class to the appropriate WAE class using setParseAction.
Hint: here is one done for you: num.setParseAction(NumWAE)
3. Modify the test loop to insert an evaluation of the parsed tree.
Extra credit: why is id last in the set of alternatives defined for
the wae expression?
-- Paul
"""
More information about the Python-list
mailing list