[Tutor] Parsing problem

Paul McGuire paul at alanweberassociates.com
Sun Jul 24 18:36:06 CEST 2005

Liam -

Great, this sounds like it's coming together.  Don't be discouraged, parsing
text like this has many forward/backward steps.

As far as stopping after one assignent, well, you might kick yourself over
this, but the answer is that you are no longer parsing just a single
assignment, but a list of them.  You cannot parse more than one assignment
with assignment as you have it, and you shouldn't.  Instead, expand the
scope of the parser to correspond to the expanded scope of input, as in:

listOfAssignments = OneOrMore( assignment )  

Now listOfAssignments is your root BNF, that you use to call parseString
against the contents of the input file.

Looking at your code, you might prefer to just enclose the contents inside
the braces inside an Optional, or a ZeroOrMore.  Seeing the other possible
elements that might be in your braces, will this work?  ZeroOrMore will take
care of the empty option, and recursively nesting RHS will avoid having to
repeat the other "scalar" entries.

RHS << ( pp.dblQuotedString.setParseAction(pp.removeQuotes) ^
         identifier ^
         integer ^
         pp.Group( LBRACE + pp.ZeroOrMore( assignment ^ RHS ) + RBRACE ) )

-- Paul

-----Original Message-----
From: Liam Clarke [mailto:cyresse at gmail.com] 
Sent: Sunday, July 24, 2005 10:21 AM
To: Paul McGuire
Cc: tutor at python.org
Subject: Re: [Tutor] Parsing problem

Hi Paul, 

That is fantastic. It works, and using that pp.group is the key with the
nested braces. 

I just ran this on the actual file after adding a few more possible values
inside the group, and it parsed the entire header structure rather nicely.

Now this will probably sound silly, but from the bit 

header = {...

it continues on with 

province = {...

and so forth. 

Now, once it reads up to the closing bracket of the header section, it
returns that parsed nicely. 
Is there a way I can tell it to continue onwards? I can see that it's
stopping at one group.

Pyparsing is wonderful, but boy... as learning curves go, I'm somewhat over
my head.

I've tried this - 

Code http://www.rafb.net/paste/results/3Dm7FF35.html
Current data http://www.rafb.net/paste/results/3cWyt169.html

assignment << (pp.OneOrMore(pp.Group( LHS + EQUALS + RHS ))) 

to try and continue the parsing, but no luck.

I've been running into the 

 File "c:\python24\Lib\site-packages\pyparsing.py", line 1427, in parseImpl
    raise maxException
pyparsing.ParseException: Expected "}" (at char 742), (line:35, col:5) 

hassle again. From the CPU loading, I'm worried I've got myself something
very badly recursive going on, but I'm unsure of how to use validate()

I've noticed that a few of the sections in between contain values like this

foo = { BAR = { HUN = 10 SOB = 6 } oof = { HUN = { } SOB = 4 } }

and so I've stuck pp.empty into my RHS possible values. What unintended side
effects may I get from using pp.empty? From the docs, it sounds like a
wildcard token, rather than matching a null.

Using pp.empty has resolved my apparent problem with empty {}'s causing my
favourite exception, but I'm just worried that I'm casting my net too wide.

Oh, and, if there's a way to get a 'last line parsed' value so as to start
parsing onwards, it would ease my day, as the only way I've found to get the
whole thing parsed is to use another x = { ... } around the whole of the
data, and now, I'm only getting the 'x' returned, so if I could parse by
section, it would help my understanding of what's happening. 

I'm still trial and error-ing a bit too much at the moment.


Liam Clarke

On 7/24/05, Paul McGuire <paul at alanweberassociates.com> wrote:

	Liam -
	Glad you are sticking with pyparsing through some of these
	One thing that might simplify your life is if you are a bit more
strict on
	specifying your grammar, especially using pp.printables as the
character set
	for your various words and values.  Is this statement really valid?
	Lw)r*)*dsflkj = sldjouwe)r#jdd
	According to your grammar, it is.  Also, by using printables, you
force your
	user to insert whitespace between the assignment target and the
equals sign. 
	I'm sure your users would like to enter a quick "a=1" once in a
while, but
	since there is no whitespace, it will all be slurped into the
left-hand side
	Let's create two expressions, LHS and RHS, to dictate what is valid
on the 
	left and right-hand side of the equals sign.  (Well, it turns out I
create a
	bunch of expressions here, in the process of defining LHS and RHS,
	hopefullly, this will make some sense):
	EQUALS = pp.Suppress ("=")
	LBRACE = pp.Suppress("{")
	RBRACE = pp.Suppress("}")
	identifier = pp.Word(pp.alphas, pp.alphanums + "_")
	integer = pp.Word(pp.nums+"-+", pp.nums)
	assignment = pp.Forward()
	LHS = identifier
	RHS = pp.Forward().setName("RHS")
	RHS << ( pp.dblQuotedString ^ identifier ^ integer ^ pp.Group(
	pp.OneOrMore(assignment) + RBRACE ) )
	assignment << pp.Group( LHS + EQUALS + RHS )
	I leave it to you to flesh out what other possible value types can
	included in RHS.
	Note also the use of the Group.  Try running this snippet with and
	Group and see how the results change.  I think using Group will help
you to 
	build up a good parse tree for the matched tokens.
	Lastly, please note in the '<<' assignment to RHS that the
expression is
	enclosed in parens.  I originally left this as
	RHS << pp.dblQuotedString ^ identifier ^ integer ^ pp.Group( LBRACE
	pp.OneOrMore(assignment) + RBRACE )
	And it failed to match!  A bug! In my own code!  The shame...
	This fails because '<<' has a higher precedence then '^', so RHS
only worked 
	if it was handed a quoted string.  Probably good practice to always
	in quotes the expression being assigned to a Forward using '<<'.
	-- Paul
	-----Original Message-----
	From: Liam Clarke [mailto: cyresse at gmail.com]
	Sent: Saturday, July 23, 2005 9:03 AM
	To: Paul McGuire
	Cc: tutor at python.org
	Subject: Re: [Tutor] Parsing problem
	*sigh* I just read the documentation more carefully and found the
	between the
	| operator and the ^ operator.
	Input -
	j = { line = { foo = 10 bar = 20 } }
	New code
	sel = pp.Forward ()
	values = ((pp.Word(pp.printables) + pp.Suppress("=") +
	pp.Word(pp.printables)) ^ sel)
	sel << (pp.Word(pp.printables) + pp.Suppress("=") + pp.Suppress("{")
	pp.OneOrMore(values) + pp.Suppress("}"))
	Output -
	(['j', 'line', 'foo', '10', 'bar', '20'], {})
	My apologies for the deluge.
	Liam Clarke
	On 7/24/05, Liam Clarke < cyresse at gmail.com
<mailto:cyresse at gmail.com> > wrote:
	        Hmmm... just a quick update, I've been poking around and I'm
	obviously making some error of logic.
	        Given a line -
	         f = "j = { line = { foo = 10 bar = 20 } }" 
	        And given the following code -
	        select = pp.Forward()
	        select <<
	        pp.Word(pp.printables) + pp.Suppress("=") + pp.Suppress("{")
	        pp.OneOrMore ( (pp.Word(pp.printables) + pp.Suppress("=") +
	        pp.Word(pp.printables) ) | select ) + pp.Suppress("}")
	        sel.parseString(f) gives -
	        (['j', 'line', '{', 'foo', '10', 'bar', '20'], {}) 
	        So I've got a bracket sneaking through there. Argh. My brain
	        Is the | operator an exclusive or?
	        Liam Clarke
	        On 7/23/05, Liam Clarke < cyresse at gmail.com > wrote:
	                I've attempted to follow your lead and have started
	scratch, I could just copy and paste your solution (which works
	well), but I want to understand what I'm doing *grin*
	                However, I've been hitting a couple of ruts in the
path to
	enlightenment. Is there a way to tell pyparsing that to treat
	escaped characters as just a slash followed by a letter? For the
time being
	I've converted all backslashes to forwardslashes, as it was choking
on \a in 
	a file path.
	                But my latest hitch, takes this form (apologies for
	                Traceback (most recent call last):
	                  File "<interactive input>", line 1, in ?
	                  File "parse.py", line 336, in parse
	                    parsedEntries = dicts.parseString(test_data)
	                  File "c:\python24\Lib\site-packages\pyparsing.py",
	616, in parseString
	                    loc, tokens = self.parse( instring.expandtabs(),
0 )
	                  File "c:\python24\Lib\site-packages\pyparsing.py",
	558, in parse
	                    loc,tokens = self.parseImpl( instring, loc,
doActions )
	                  File "c:\python24\Lib\site-packages\pyparsing.py",
	1518, in parseImpl
	                    return self.expr.parse( instring, loc, doActions
	                  File "c:\python24\Lib\site-packages\pyparsing.py",
	558, in parse
	                    loc,tokens = self.parseImpl( instring, loc,
doActions )
	                  File "c:\python24\Lib\site-packages\pyparsing.py",
	1367, in parseImpl
	                    loc, exprtokens = e.parse( instring, loc,
doActions )
	                  File "c:\python24\Lib\site-packages\pyparsing.py",
	558, in parse
	                    loc,tokens = self.parseImpl( instring, loc,
doActions )
	                  File "c:\python24\Lib\site-packages\pyparsing.py",
	1518, in parseImpl
	                    return self.expr.parse( instring, loc, doActions
	                  File "c:\python24\Lib\site-packages\pyparsing.py",
	560, in parse
	                    raise ParseException, ( instring, len(instring),
	self.errmsg, self )
	                ParseException: Expected "}" (at char 9909),
	                The offending code can be found here (includes the
data) -
	                It's like pyparsing isn't recognising a lot of my
"}"'s, as
	if I add another one, it throws the same error, same for adding
	                No doubt I've done something silly, but any help in
	the tragic flaw would be much appreciated. I need to get a
	object out so I can learn how to work with the basic structure!
	                Much regards, 
	                Liam Clarke
	                On 7/21/05, Paul McGuire <
paul at alanweberassociates.com
	<mailto:paul at alanweberassociates.com> > wrote:
	                        Liam, Kent, and Danny -
	                        It sure looks like pyparsing is taking on a
life of
	its own!  I can see I no
	                        longer am the only one pitching pyparsing at
some of
	these applications!
	                        Yes, Liam, it is possible to create
	objects, that is,
	                        ParseResults objects that have named values
in them.
	I looked into your
	                        application, and the nested assignments seem
	similar to a ConfigParse
	                        type of structure.  Here is a pyparsing
version that
	handles the test data
	                        in your original post (I kept Danny Yoo's
	list values, and added
	                        recursive dictionary entries):
	                        import pyparsing as pp
	                        listValue = pp.Forward()
	                        listSeq = pp.Suppress ('{') +
	pp.Group(pp.ZeroOrMore(listValue)) +
	                        listValue << (
	pp.dblQuotedString.setParseAction(pp.removeQuotes) |
	                                        pp.Word(pp.alphanums) |
listSeq )
	                        keyName = pp.Word( pp.alphas )
	                        entries = pp.Forward()
	                        entrySeq = pp.Suppress('{') +
	pp.Group(pp.OneOrMore(entries)) +
	                        entries << pp.Dict(
	                                    pp.OneOrMore (
	                                        pp.Group( keyName +
	+ (entrySeq |
	                        listValue) ) ) )
	                        Dict is one of the most confusing classes to
	and there are some
	                        examples in the examples directory that
comes with
	pyparsing (see
	                        dictExample2.py), but it is still tricky.
Here is
	some code to access your
	                        input test data, repeated here for easy
	                        testdata = """\
	                        country = {
	                        tag = ENG
	                        ai = {
	                        flags = { }
	                        combat = { DAU FRA ORL PRO }
	                        continent = { }
	                        area = { }
	                        region = { "British Isles" "NorthSeaSea"
	"ECAtlanticSea" "NAtlanticSea"
	                        "TagoSea" "WCAtlanticSea" }
	                        war = 60
	                        ferocity = no
	                        parsedEntries =
	                        def dumpEntries(dct,depth=0):
	                            keys = dct.keys()
	                            for k in keys:
	                                print ('  '*depth) + '- ' + k + ':',
	                                    if dct[k][0].keys():
	                                        print dct[k][0]
	                                    print dct[k]
	                        dumpEntries( parsedEntries )
	                        print parsedEntries.country[0].tag
	                        print parsedEntries.country[0].ai[0].war
	                        This will print out:
	                        - country:
	                          - ai:
	                            - area: []
	                            - combat: ['DAU', 'FRA', 'ORL', 'PRO']
	                            - continent: []
	                            - ferocity: no
	                            - flags: []
	                            - region: ['British Isles',
	                        'NAtlanticSea', 'TagoSea', 'WCAtlanticSea']
	                            - war: 60
	                          - tag: ENG
	                        But I really dislike having to dereference
	nested values using the
	                        0'th element.  So I'm going to fix pyparsing
so that
	in the next release,
	                        you'll be able to reference the sub-elements
	                        print parsedEntries.country.tag
	                        print parsedEntries.country.ai.war
	                        print parsedEntries.country.ai.ferocity
	                        This *may* break some existing code, but
Dict is not
	heavily used, based on
	                        feedback from users, and this may make it
	useful in general, especially
	                        when data parses into nested Dict's.
	                        Hope this sheds more light than confusion!
	                        -- Paul McGuire
	                        Tutor maillist  -   Tutor at python.org
	<mailto:Tutor at python.org>
	                'There is only one basic human right, and that is to
do as
	you damn well please.
	                And with it comes the only basic human duty, to take
	        'There is only one basic human right, and that is to do as
you damn
	well please.
	        And with it comes the only basic human duty, to take the 
	'There is only one basic human right, and that is to do as you damn
	And with it comes the only basic human duty, to take the

'There is only one basic human right, and that is to do as you damn well
And with it comes the only basic human duty, to take the consequences.' 

More information about the Tutor mailing list