Hi Paul, <br>
<br>
I am kicking myself for the pp.OneOrMore(assignments) bit. It's parsing
well now, it stops whenever it hits a character it can't handle, ( a
minus sign snuck in instead of a equals at one point, but I think I put
it there) this makes tweaking quite easy. <br>
<br>
Just a quick query on how Word works. <br>
<br>
These two lines - <br>
<br>
identifier = pp.Word(pp.alphas, pp.alphanums + "_/:.")<br>
integer = pp.Word(pp.nums+"-+.", pp.nums)<br>
<br>
It's stopped at integer values which contain a decimal point, which I
thought I'd taken care of with my additions to the above. How do the
initChars and bodyChars affect a token?<br>
<br>
Regards, <br>
<br>
Liam Clarke<span style="font-family: monospace;"></span><code><span class="summary-sig"><span class="summary-sig-arg"></span><span class="summary-sig-arg"></span></span></code><br>
<br>
<br><br><div><span class="gmail_quote">On 7/25/05, <b class="gmail_sendername">Paul McGuire</b> <<a href="mailto:paul@alanweberassociates.com">paul@alanweberassociates.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Liam -<br><br>I just uploaded an update to pyparsing, version 1.3.2, that should fix the<br>problem with using nested Dicts. Now you won't need to use [0] to<br>dereference the "0'th" element, just reference the nested elements as
a.b.c,<br>or a["b"]["c"].<br><br>-- Paul<br><br><br>-----Original Message-----<br>From: Liam Clarke [mailto:<a href="mailto:cyresse@gmail.com">cyresse@gmail.com</a>]<br>Sent: Sunday, July 24, 2005 10:21 AM
<br>To: Paul McGuire<br>Cc: <a href="mailto:tutor@python.org">tutor@python.org</a><br>Subject: Re: [Tutor] Parsing problem<br><br>Hi Paul,<br><br>That is fantastic. It works, and using that pp.group is the key with the<br>
nested braces.<br><br>I just ran this on the actual file after adding a few more possible values<br>inside the group, and it parsed the entire header structure rather nicely.<br><br>Now this will probably sound silly, but from the bit
<br><br>header = {...<br>...<br>}<br><br>it continues on with<br><br>province = {...<br>}<br><br>and so forth.<br><br>Now, once it reads up to the closing bracket of the header section, it<br>returns that parsed nicely.<br>
Is there a way I can tell it to continue onwards? I can see that it's<br>stopping at one group.<br><br>Pyparsing is wonderful, but boy... as learning curves go, I'm somewhat over<br>my head.<br><br>I've tried this -<br><br>
Code <a href="http://www.rafb.net/paste/results/3Dm7FF35.html">http://www.rafb.net/paste/results/3Dm7FF35.html</a><br>Current data <a href="http://www.rafb.net/paste/results/3cWyt169.html">http://www.rafb.net/paste/results/3cWyt169.html
</a><br><br>assignment << (pp.OneOrMore(pp.Group( LHS + EQUALS + RHS )))<br><br>to try and continue the parsing, but no luck.<br><br>I've been running into the<br><br> File "c:\python24\Lib\site-packages\pyparsing.py", line 1427, in parseImpl
<br> raise maxException<br>pyparsing.ParseException: Expected "}" (at char 742), (line:35, col:5)<br><br>hassle again. From the CPU loading, I'm worried I've got myself something<br>very badly recursive going on, but I'm unsure of how to use validate()
<br><br>I've noticed that a few of the sections in between contain values like this<br>-<br><br>foo = { BAR = { HUN = 10 SOB = 6 } oof = { HUN = { } SOB = 4 } }<br><br>and so I've stuck pp.empty into my RHS possible values. What unintended side
<br>effects may I get from using pp.empty? From the docs, it sounds like a<br>wildcard token, rather than matching a null.<br><br>Using pp.empty has resolved my apparent problem with empty {}'s causing my<br>favourite exception, but I'm just worried that I'm casting my net too wide.
<br><br>Oh, and, if there's a way to get a 'last line parsed' value so as to start<br>parsing onwards, it would ease my day, as the only way I've found to get the<br>whole thing parsed is to use another x = { ... } around the whole of the
<br>data, and now, I'm only getting the 'x' returned, so if I could parse by<br>section, it would help my understanding of what's happening.<br><br>I'm still trial and error-ing a bit too much at the moment.<br><br>Regards,
<br><br>Liam Clarke<br><br><br><br><br><br>On 7/24/05, Paul McGuire <<a href="mailto:paul@alanweberassociates.com">paul@alanweberassociates.com</a>> wrote:<br><br> Liam -<br><br> Glad you are sticking with pyparsing through some of these
<br>idiosyncracies!<br><br> One thing that might simplify your life is if you are a bit more<br>strict on<br> specifying your grammar, especially using pp.printables as the<br>character set<br> for your various words and values. Is this statement really valid?
<br><br> Lw)r*)*dsflkj = sldjouwe)r#jdd<br><br> According to your grammar, it is. Also, by using printables, you<br>force your<br> user to insert whitespace between the assignment target and the<br>equals sign.
<br> I'm sure your users would like to enter a quick "a=1" once in a<br>while, but<br> since there is no whitespace, it will all be slurped into the<br>left-hand side<br> identifier.<br><br>
Let's create two expressions, LHS and RHS, to dictate what is valid<br>on the<br> left and right-hand side of the equals sign. (Well, it turns out I<br>create a<br> bunch of expressions here, in the process of defining LHS and RHS,
<br>but<br> hopefullly, this will make some sense):<br><br> EQUALS = pp.Suppress ("=")<br> LBRACE = pp.Suppress("{")<br> RBRACE = pp.Suppress("}")<br> identifier =
pp.Word(pp.alphas, pp.alphanums + "_")<br> integer = pp.Word(pp.nums+"-+", pp.nums)<br> assignment = pp.Forward()<br> LHS = identifier<br> RHS = pp.Forward().setName("RHS")
<br> RHS << ( pp.dblQuotedString ^ identifier ^ integer ^ pp.Group(<br>LBRACE +<br> pp.OneOrMore(assignment) + RBRACE ) )<br> assignment << pp.Group( LHS + EQUALS + RHS )<br><br> I leave it to you to flesh out what other possible value types can
<br>be<br> included in RHS.<br><br> Note also the use of the Group. Try running this snippet with and<br>without<br> Group and see how the results change. I think using Group will help<br>you to<br>
build up a good parse tree for the matched tokens.<br><br> Lastly, please note in the '<<' assignment to RHS that the<br>expression is<br> enclosed in parens. I originally left this as<br><br>
RHS << pp.dblQuotedString ^ identifier ^ integer ^ pp.Group( LBRACE<br>+<br> pp.OneOrMore(assignment) + RBRACE )<br><br> And
it failed to match! A bug! In my own code! The
shame...<br><br> This fails because '<<' has a higher precedence then '^', so RHS<br>only worked<br> if it was handed a quoted string. Probably good practice to always<br>enclose<br> in quotes the expression being assigned to a Forward using '<<'.
<br><br> -- Paul<br><br><br> -----Original Message-----<br> From: Liam Clarke [mailto: <a href="mailto:cyresse@gmail.com">cyresse@gmail.com</a>]<br> Sent: Saturday, July 23, 2005 9:03 AM<br> To: Paul McGuire
<br> Cc: <a href="mailto:tutor@python.org">tutor@python.org</a><br> Subject: Re: [Tutor] Parsing problem<br><br> *sigh* I just read the documentation more carefully and found the<br>difference<br> between the
<br> | operator and the ^ operator.<br><br> Input -<br><br> j = { line = { foo = 10 bar = 20 } }<br><br> New code<br><br> sel = pp.Forward ()<br> values = ((pp.Word(pp.printables) +
pp.Suppress("=") +<br> pp.Word(pp.printables)) ^ sel)<br> sel << (pp.Word(pp.printables) + pp.Suppress("=") + pp.Suppress("{")<br>+<br> pp.OneOrMore(values) + pp.Suppress
("}"))<br><br> Output -<br><br> (['j', 'line', 'foo', '10', 'bar', '20'], {})<br><br> My apologies for the deluge.<br><br> Regards,<br><br> Liam Clarke<br><br><br> On 7/24/05, Liam Clarke <
<a href="mailto:cyresse@gmail.com">cyresse@gmail.com</a><br><mailto:<a href="mailto:cyresse@gmail.com">cyresse@gmail.com</a>> > wrote:<br><br> Hmmm...
just a quick update, I've been poking around and I'm<br> obviously making some error of logic.<br><br> Given a line -<br><br>
f = "j = { line = { foo = 10 bar = 20 } }"<br><br> And given the following code -<br><br> select = pp.Forward()<br> select <<<br> pp.Word(pp.printables
)
+ pp.Suppress("=") + pp.Suppress("{")<br>+<br> pp.OneOrMore
( (pp.Word(pp.printables) + pp.Suppress("=") +<br> pp.Word(pp.printables)
) | select ) + pp.Suppress("}")<br><br> sel.parseString(f) gives -<br><br> (['j',
'line', '{', 'foo', '10', 'bar', '20'], {})<br><br> So
I've got a bracket sneaking through there. Argh. My brain<br>hurts.<br><br> Is
the | operator an exclusive or?<br><br> Befuddled,<br><br> Liam Clarke<br><br><br><br> On
7/23/05, Liam Clarke < <a href="mailto:cyresse@gmail.com">cyresse@gmail.com</a> > wrote:<br><br> Howdy,<br><br> I've
attempted to follow your lead and have started<br>from<br> scratch, I could just copy and paste your solution (which works<br>pretty<br> well), but I want to understand what I'm doing *grin*<br><br> However,
I've been hitting a couple of ruts in the<br>path to<br> enlightenment. Is there a way to tell pyparsing that to treat<br>specific<br> escaped characters as just a slash followed by a letter? For the<br>time being
<br> I've converted all backslashes to forwardslashes, as it was choking<br>on \a in<br> a file path.<br><br> But
my latest hitch, takes this form (apologies for<br>large<br> traceback)<br><br> Traceback
(most recent call last):<br> File
"<interactive input>", line 1, in ?<br> File
"parse.py", line 336, in parse<br> parsedEntries
= dicts.parseString(test_data)<br> File
"c:\python24\Lib\site-packages\pyparsing.py",<br>line<br> 616, in parseString<br> loc,
tokens = self.parse( instring.expandtabs(),<br>0 )<br> File
"c:\python24\Lib\site-packages\pyparsing.py",<br>line<br> 558, in parse<br> loc,tokens
= self.parseImpl( instring, loc,<br>doActions )<br> File
"c:\python24\Lib\site-packages\pyparsing.py",<br>line<br> 1518, in parseImpl<br> return
self.expr.parse( instring, loc, doActions<br>)<br> File
"c:\python24\Lib\site-packages\pyparsing.py",<br>line<br> 558, in parse<br> loc,tokens
= self.parseImpl( instring, loc,<br>doActions )<br> File
"c:\python24\Lib\site-packages\pyparsing.py",<br>line<br> 1367, in parseImpl<br> loc,
exprtokens = e.parse( instring, loc,<br>doActions )<br> File
"c:\python24\Lib\site-packages\pyparsing.py",<br>line<br> 558, in parse<br> loc,tokens
= self.parseImpl( instring, loc,<br>doActions )<br> File
"c:\python24\Lib\site-packages\pyparsing.py",<br>line<br> 1518, in parseImpl<br> return
self.expr.parse( instring, loc, doActions<br>)<br> File
"c:\python24\Lib\site-packages\pyparsing.py",<br>line<br> 560, in parse<br> raise
ParseException, ( instring, len(instring),<br> self.errmsg, self )<br><br> ParseException:
Expected "}" (at char 9909),<br>(line:325,<br> col:5)<br><br> The
offending code can be found here (includes the<br>data) -<br> <a href="http://www.rafb.net/paste/results/L560wx80.html">http://www.rafb.net/paste/results/L560wx80.html</a><br><br> It's
like pyparsing isn't recognising a lot of my<br>"}"'s, as<br> if I add another one, it throws the same error, same for adding<br>another<br> two...<br><br> No
doubt I've done something silly, but any help in<br>finding<br> the tragic flaw would be much appreciated. I need to get a<br>parsingResults<br> object out so I can learn how to work with the basic structure!
<br><br> Much
regards,<br><br> Liam
Clarke<br><br><br><br> On
7/21/05, Paul McGuire <<br><a href="mailto:paul@alanweberassociates.com">paul@alanweberassociates.com</a><br> <mailto:<a href="mailto:paul@alanweberassociates.com">paul@alanweberassociates.com</a>> > wrote:
<br><br> Liam,
Kent, and Danny -<br><br> It
sure looks like pyparsing is taking on a<br>life of<br> its own! I can see I no<br> longer
am the only one pitching pyparsing at<br>some of<br> these applications!<br><br> Yes,
Liam, it is possible to create<br>dictionary-like<br> objects, that is,<br> ParseResults
objects that have named values<br>in them.<br> I looked into your<br> application,
and the nested assignments seem<br>very<br> similar to a ConfigParse<br> type
of structure. Here is a pyparsing<br>version that<br> handles the test data<br> in
your original post (I kept Danny Yoo's<br>recursive<br> list values, and added<br> recursive
dictionary entries):<br><br> --------------------------<br> import
pyparsing as pp<br><br> listValue
= pp.Forward()<br> listSeq
= pp.Suppress ('{') +<br> pp.Group(pp.ZeroOrMore(listValue)) +<br> pp.Suppress('}')<br> listValue
<< (<br> pp.dblQuotedString.setParseAction(pp.removeQuotes) |<br> pp.Word(pp.alphanums)
|<br>listSeq )<br><br> keyName
= pp.Word( pp.alphas )<br><br> entries
= pp.Forward()<br> entrySeq
= pp.Suppress('{') +<br> pp.Group(pp.OneOrMore(entries)) +<br> pp.Suppress('}')<br> entries
<< pp.Dict(<br> pp.OneOrMore
(<br> pp.Group(
keyName +<br>pp.Suppress('=')<br> + (entrySeq |<br> listValue)
) ) )<br> --------------------------<br><br><br> Dict
is one of the most confusing classes to<br>use,<br> and there are some<br> examples
in the examples directory that<br>comes with<br> pyparsing (see<br> dictExample2.py),
but it is still tricky.<br>Here is<br> some code to access your<br> input
test data, repeated here for easy<br>reference:<br><br> --------------------------<br> testdata
= """\<br> country
= {<br> tag
= ENG<br> ai
= {<br> flags
= { }<br> combat
= { DAU FRA ORL PRO }<br> continent
= { }<br> area
= { }<br> region
= { "British Isles" "NorthSeaSea"<br> "ECAtlanticSea" "NAtlanticSea"<br> "TagoSea"
"WCAtlanticSea" }<br> war
= 60<br> ferocity
= no<br> }<br> }<br> """<br> parsedEntries
=<br>entries.parseString(testdata)<br><br> def
dumpEntries(dct,depth=0):<br> keys
= dct.keys()<br> keys.sort()<br> for
k in keys:<br> print
(' '*depth) + '- ' + k + ':',<br> if<br>isinstance(dct[k],pp.ParseResults):<br> if
dct[k][0].keys():<br> print<br><br>dumpEntries(dct[k][0],depth+1)<br> else:<br> print
dct[k][0]<br> else:<br> print
dct[k]<br><br> dumpEntries(
parsedEntries )<br><br> print<br> print
parsedEntries.country[0].tag<br> print
parsedEntries.country[0].ai[0].war<br> print<br>parsedEntries.country[0].ai[0].ferocity<br> --------------------------<br><br> This
will print out:<br><br> --------------------------<br> -
country:<br> -
ai:<br> -
area: []<br> -
combat: ['DAU', 'FRA', 'ORL', 'PRO']<br> -
continent: []<br> -
ferocity: no<br> -
flags: []<br> -
region: ['British Isles',<br>'NorthSeaSea',<br> 'ECAtlanticSea',<br> 'NAtlanticSea',
'TagoSea', 'WCAtlanticSea']<br> -
war: 60<br> -
tag: ENG<br><br> ENG<br> 60<br> No<br> --------------------------<br><br> But
I really dislike having to dereference<br>those<br> nested values using the<br> 0'th
element. So I'm going to fix pyparsing<br>so that<br> in the next release,<br> you'll
be able to reference the sub-elements<br>as:<br><br> print
parsedEntries.country.tag<br> print
parsedEntries.country.ai.war<br> print
parsedEntries.country.ai.ferocity<br><br> This
*may* break some existing code, but<br>Dict is not<br> heavily used, based on<br> feedback
from users, and this may make it<br>more<br> useful in general, especially<br> when
data parses into nested Dict's.<br><br> Hope
this sheds more light than confusion!<br> --
Paul McGuire<br><br><br>_______________________________________________<br> Tutor
maillist - <a href="mailto:Tutor@python.org">Tutor@python.org</a><br> <mailto:<a href="mailto:Tutor@python.org">Tutor@python.org</a>><br><br><a href="http://mail.python.org/mailman/listinfo/tutor">http://mail.python.org/mailman/listinfo/tutor
</a><br><<a href="http://mail.python.org/mailman/listinfo/tutor">http://mail.python.org/mailman/listinfo/tutor</a>><br><br><br><br><br><br> --<br> 'There
is only one basic human right, and that is to<br>do as<br> you damn well please.<br> And
with it comes the only basic human duty, to take<br>the<br> consequences.'<br><br><br><br><br> --<br> 'There
is only one basic human right, and that is to do as<br>you damn<br> well please.<br> And
with it comes the only basic human duty, to take the<br> consequences.'<br><br><br><br><br> --<br> 'There is only one basic human right, and that is to do as you damn<br>well<br> please.<br> And with it comes the only basic human duty, to take the
<br>consequences.'<br><br><br><br><br><br><br>--<br>'There is only one basic human right, and that is to do as you damn well<br>please.<br>And with it comes the only basic human duty, to take the consequences.'<br><br></blockquote>
</div><br><br clear="all"><br>-- <br>'There is only one basic human right, and that is to do as you damn well please.<br>And with it comes the only basic human duty, to take the consequences.'