Hi Paul, <br>
<br>
That is fantastic. It works, and using that pp.group is the key with the nested braces. <br>
<br>
I just ran this on the actual file after adding a few more possible
values inside the group, and it parsed the entire header structure
rather nicely.<br>
<br>
Now this will probably sound silly, but from the bit <br>
<br>
header = {...<br>
...<br>
}<br>
<br>
it continues on with <br>
<br>
province = {...<br>
} <br>
<br>
and so forth. <br>
<br>
Now, once it reads up to the closing bracket of the header section, it returns that parsed nicely. <br>
Is there a way I can tell it to continue onwards? I can see that it's stopping at one group.<br>
<br>
Pyparsing is wonderful, but boy... as learning curves go, I'm somewhat over my head.<br>
<br>
I've tried this - <br>
<br>
Code <a href="http://www.rafb.net/paste/results/3Dm7FF35.html">http://www.rafb.net/paste/results/3Dm7FF35.html</a><br>
Current data <a href="http://www.rafb.net/paste/results/3cWyt169.html">http://www.rafb.net/paste/results/3cWyt169.html</a><br>
<br>
assignment << (pp.OneOrMore(pp.Group( LHS + EQUALS + RHS ))) <br>
<br>
to try and continue the parsing, but no luck.<br>
<br>
I've been running into the <br>
<br>
File "c:\python24\Lib\site-packages\pyparsing.py", line 1427, in parseImpl<br>
raise maxException<br>
pyparsing.ParseException: Expected "}" (at char 742), (line:35, col:5) <br>
<br>
hassle again. From the CPU loading, I'm worried I've got myself
something very badly recursive going on, but I'm unsure of how to use
validate()<br>
<br>
I've noticed that a few of the sections in between contain values like this - <br>
<br>
foo = { BAR = { HUN = 10 SOB = 6 } oof = { HUN = { } SOB = 4 } }<br>
<br>
and so I've stuck pp.empty into my RHS possible values. What unintended
side effects may I get from using pp.empty? From the docs, it sounds
like a wildcard token, rather than matching a null.<br>
<br>
Using pp.empty has resolved my apparent problem with empty {}'s causing
my favourite exception, but I'm just worried that I'm casting my net
too wide.<br>
<br>
Oh, and, if there's a way to get a 'last line parsed' value so as to
start parsing onwards, it would ease my day, as the only way I've found
to get the whole thing parsed is to use another x = { ... } around the
whole of the data, and now, I'm only getting the 'x' returned, so if I
could parse by section, it would help my understanding of what's
happening. <br>
<br>
I'm still trial and error-ing a bit too much at the moment.<br>
<br>
Regards, <br>
<br>
Liam Clarke<br>
<br>
<br>
<br>
<br><div><span class="gmail_quote">On 7/24/05, <b class="gmail_sendername">Paul McGuire</b> <<a href="mailto:paul@alanweberassociates.com">paul@alanweberassociates.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Liam -<br><br>Glad you are sticking with pyparsing through some of these idiosyncracies!<br><br>One thing that might simplify your life is if you are a bit more strict on<br>specifying your grammar, especially using pp.printables
as the character set<br>for your various words and values. Is this statement really valid?<br><br>Lw)r*)*dsflkj = sldjouwe)r#jdd<br><br>According to your grammar, it is. Also, by using printables, you force your<br>user to insert whitespace between the assignment target and the equals sign.
<br>I'm sure your users would like to enter a quick "a=1" once in a while, but<br>since there is no whitespace, it will all be slurped into the left-hand side<br>identifier.<br><br>Let's create two expressions, LHS and RHS, to dictate what is valid on the
<br>left and right-hand side of the equals sign. (Well, it turns out I create a<br>bunch of expressions here, in the process of defining LHS and RHS, but<br>hopefullly, this will make some sense):<br><br>EQUALS = pp.Suppress
("=")<br>LBRACE = pp.Suppress("{")<br>RBRACE = pp.Suppress("}")<br>identifier = pp.Word(pp.alphas, pp.alphanums + "_")<br>integer = pp.Word(pp.nums+"-+", pp.nums)<br>assignment =
pp.Forward()<br>LHS = identifier<br>RHS = pp.Forward().setName("RHS")<br>RHS << ( pp.dblQuotedString ^ identifier ^ integer ^ pp.Group( LBRACE +<br>pp.OneOrMore(assignment) + RBRACE ) )<br>assignment <<
pp.Group( LHS + EQUALS + RHS )<br><br>I leave it to you to flesh out what other possible value types can be<br>included in RHS.<br><br>Note also the use of the Group. Try running this snippet with and without<br>Group and see how the results change. I think using Group will help you to
<br>build up a good parse tree for the matched tokens.<br><br>Lastly, please note in the '<<' assignment to RHS that the expression is<br>enclosed in parens. I originally left this as<br><br>RHS << pp.dblQuotedString
^ identifier ^ integer ^ pp.Group( LBRACE +<br>pp.OneOrMore(assignment) + RBRACE )<br><br>And it failed to match! A bug! In my own code! The shame...<br><br>This fails because '<<' has a higher precedence then '^', so RHS only worked
<br>if it was handed a quoted string. Probably good practice to always enclose<br>in quotes the expression being assigned to a Forward using '<<'.<br><br>-- Paul<br><br><br>-----Original Message-----<br>From: Liam Clarke [mailto:
<a href="mailto:cyresse@gmail.com">cyresse@gmail.com</a>]<br>Sent: Saturday, July 23, 2005 9:03 AM<br>To: Paul McGuire<br>Cc: <a href="mailto:tutor@python.org">tutor@python.org</a><br>Subject: Re: [Tutor] Parsing problem<br>
<br>*sigh* I just read the documentation more carefully and found the difference<br>between the<br>| operator and the ^ operator.<br><br>Input -<br><br>j = { line = { foo = 10 bar = 20 } }<br><br>New code<br><br>sel = pp.Forward
()<br>values = ((pp.Word(pp.printables) + pp.Suppress("=") +<br>pp.Word(pp.printables)) ^ sel)<br>sel << (pp.Word(pp.printables) + pp.Suppress("=") + pp.Suppress("{") +<br>pp.OneOrMore(values) +
pp.Suppress("}"))<br><br>Output -<br><br>(['j', 'line', 'foo', '10', 'bar', '20'], {})<br><br>My apologies for the deluge.<br><br>Regards,<br><br>Liam Clarke<br><br><br>On 7/24/05, Liam Clarke <<a href="mailto:cyresse@gmail.com">
cyresse@gmail.com</a>> wrote:<br><br> Hmmm... just a quick update, I've been poking around and I'm<br>obviously making some error of logic.<br><br> Given a line -<br><br> f = "j = { line = { foo = 10 bar = 20 } }"
<br><br> And given the following code -<br><br> select = pp.Forward()<br> select <<<br> pp.Word(pp.printables) + pp.Suppress("=") + pp.Suppress("{") +<br> pp.OneOrMore
( (pp.Word(pp.printables) + pp.Suppress("=") +<br> pp.Word(pp.printables) ) | select ) + pp.Suppress("}")<br><br> sel.parseString(f) gives -<br><br> (['j', 'line', '{', 'foo', '10', 'bar', '20'], {})
<br><br> So I've got a bracket sneaking through there. Argh. My brain hurts.<br><br> Is the | operator an exclusive or?<br><br> Befuddled,<br><br> Liam Clarke<br><br><br><br> On 7/23/05, Liam Clarke <
<a href="mailto:cyresse@gmail.com">cyresse@gmail.com</a> > wrote:<br><br> Howdy,<br><br> I've
attempted to follow your lead and have started from<br>scratch, I could just copy and paste your solution (which works pretty<br>well), but I want to understand what I'm doing *grin*<br><br> However,
I've been hitting a couple of ruts in the path to<br>enlightenment. Is there a way to tell pyparsing that to treat specific<br>escaped characters as just a slash followed by a letter? For the time being<br>I've converted all backslashes to forwardslashes, as it was choking on \a in
<br>a file path.<br><br> But
my latest hitch, takes this form (apologies for large<br>traceback)<br><br> Traceback
(most recent call last):<br> File
"<interactive input>", line 1, in ?<br> File
"parse.py", line 336, in parse<br> parsedEntries
= dicts.parseString(test_data)<br> File
"c:\python24\Lib\site-packages\pyparsing.py", line<br>616, in parseString<br> loc,
tokens = self.parse( instring.expandtabs(), 0 )<br> File
"c:\python24\Lib\site-packages\pyparsing.py", line<br>558, in parse<br> loc,tokens
= self.parseImpl( instring, loc, doActions )<br> File
"c:\python24\Lib\site-packages\pyparsing.py", line<br>1518, in parseImpl<br> return
self.expr.parse( instring, loc, doActions )<br> File
"c:\python24\Lib\site-packages\pyparsing.py", line<br>558, in parse<br> loc,tokens
= self.parseImpl( instring, loc, doActions )<br> File
"c:\python24\Lib\site-packages\pyparsing.py", line<br>1367, in parseImpl<br> loc,
exprtokens = e.parse( instring, loc, doActions )<br> File
"c:\python24\Lib\site-packages\pyparsing.py", line<br>558, in parse<br> loc,tokens
= self.parseImpl( instring, loc, doActions )<br> File
"c:\python24\Lib\site-packages\pyparsing.py", line<br>1518, in parseImpl<br> return
self.expr.parse( instring, loc, doActions )<br> File
"c:\python24\Lib\site-packages\pyparsing.py", line<br>560, in parse<br> raise
ParseException, ( instring, len(instring),<br>self.errmsg, self )<br><br> ParseException:
Expected "}" (at char 9909), (line:325,<br>col:5)<br><br> The
offending code can be found here (includes the data) -<br><a href="http://www.rafb.net/paste/results/L560wx80.html">http://www.rafb.net/paste/results/L560wx80.html</a><br><br> It's
like pyparsing isn't recognising a lot of my "}"'s, as<br>if I add another one, it throws the same error, same for adding another<br>two...<br><br> No
doubt I've done something silly, but any help in finding<br>the tragic flaw would be much appreciated. I need to get a parsingResults<br>object out so I can learn how to work with the basic structure!<br><br> Much regards,
<br><br> Liam Clarke<br><br><br><br> On
7/21/05, Paul McGuire < <a href="mailto:paul@alanweberassociates.com">paul@alanweberassociates.com</a><br><mailto:<a href="mailto:paul@alanweberassociates.com">paul@alanweberassociates.com</a>> > wrote:<br><br>
Liam,
Kent, and Danny -<br><br> It
sure looks like pyparsing is taking on a life of<br>its own! I can see I no<br> longer
am the only one pitching pyparsing at some of<br>these applications!<br><br> Yes,
Liam, it is possible to create dictionary-like<br>objects, that is,<br> ParseResults
objects that have named values in them.<br>I looked into your<br> application,
and the nested assignments seem very<br>similar to a ConfigParse<br> type
of structure. Here is a pyparsing version that<br>handles the test data<br> in
your original post (I kept Danny Yoo's recursive<br>list values, and added<br> recursive
dictionary entries):<br><br> --------------------------<br> import
pyparsing as pp<br><br> listValue
= pp.Forward()<br> listSeq
= pp.Suppress ('{') +<br>pp.Group(pp.ZeroOrMore(listValue)) +<br> pp.Suppress('}')<br> listValue
<< (<br>pp.dblQuotedString.setParseAction(pp.removeQuotes) |<br> pp.Word(pp.alphanums)
| listSeq )<br><br> keyName
= pp.Word( pp.alphas )<br><br> entries
= pp.Forward()<br> entrySeq
= pp.Suppress('{') +<br>pp.Group(pp.OneOrMore(entries)) +<br> pp.Suppress('}')<br> entries
<< pp.Dict(<br> pp.OneOrMore
(<br> pp.Group(
keyName + pp.Suppress('=')<br>+ (entrySeq |<br> listValue)
) ) )<br> --------------------------<br><br><br> Dict
is one of the most confusing classes to use,<br>and there are some<br> examples
in the examples directory that comes with<br>pyparsing (see<br> dictExample2.py),
but it is still tricky. Here is<br>some code to access your<br> input
test data, repeated here for easy reference:<br><br> --------------------------<br> testdata
= """\<br> country
= {<br> tag
= ENG<br> ai
= {<br> flags
= { }<br> combat
= { DAU FRA ORL PRO }<br> continent
= { }<br> area
= { }<br> region
= { "British Isles" "NorthSeaSea"<br>"ECAtlanticSea" "NAtlanticSea"<br> "TagoSea"
"WCAtlanticSea" }<br> war
= 60<br> ferocity
= no<br> }<br> }<br> """<br> parsedEntries
= entries.parseString(testdata)<br><br> def
dumpEntries(dct,depth=0):<br> keys
= dct.keys()<br> keys.sort()<br> for
k in keys:<br> print
(' '*depth) + '- ' + k + ':',<br> if
isinstance(dct[k],pp.ParseResults):<br> if
dct[k][0].keys():<br> print<br> dumpEntries(dct[k][0],depth+1)<br> else:<br> print
dct[k][0]<br> else:<br> print
dct[k]<br><br> dumpEntries(
parsedEntries )<br><br> print<br> print
parsedEntries.country[0].tag<br> print
parsedEntries.country[0].ai[0].war<br> print
parsedEntries.country[0].ai[0].ferocity<br> --------------------------<br><br> This
will print out:<br><br> --------------------------<br> -
country:<br> -
ai:<br> -
area: []<br> -
combat: ['DAU', 'FRA', 'ORL', 'PRO']<br> -
continent: []<br> -
ferocity: no<br> -
flags: []<br> -
region: ['British Isles', 'NorthSeaSea',<br>'ECAtlanticSea',<br> 'NAtlanticSea',
'TagoSea', 'WCAtlanticSea']<br> -
war: 60<br> -
tag: ENG<br><br> ENG<br> 60<br> No<br> --------------------------<br><br> But
I really dislike having to dereference those<br>nested values using the<br> 0'th
element. So I'm going to fix pyparsing so that<br>in the next release,<br> you'll
be able to reference the sub-elements as:<br><br> print
parsedEntries.country.tag<br> print
parsedEntries.country.ai.war<br> print
parsedEntries.country.ai.ferocity<br><br> This
*may* break some existing code, but Dict is not<br>heavily used, based on<br> feedback
from users, and this may make it more<br>useful in general, especially<br> when
data parses into nested Dict's.<br><br> Hope
this sheds more light than confusion!<br> --
Paul McGuire<br><br> _______________________________________________<br> Tutor
maillist - <a href="mailto:Tutor@python.org">Tutor@python.org</a><br><mailto:<a href="mailto:Tutor@python.org">Tutor@python.org</a>><br> <a href="http://mail.python.org/mailman/listinfo/tutor">
http://mail.python.org/mailman/listinfo/tutor</a><br><br><br><br><br><br> --<br> 'There
is only one basic human right, and that is to do as<br>you damn well please.<br> And
with it comes the only basic human duty, to take the<br>consequences.'<br><br><br><br><br> --<br> 'There is only one basic human right, and that is to do as you damn<br>well please.<br> And with it comes the only basic human duty, to take the
<br>consequences.'<br><br><br><br><br>--<br>'There is only one basic human right, and that is to do as you damn well<br>please.<br>And with it comes the only basic human duty, to take the consequences.'<br><br></blockquote>
</div><br><br clear="all"><br>-- <br>'There is only one basic human right, and that is to do as you damn well please.<br>And with it comes the only basic human duty, to take the consequences.'