[Tutor] Parsing problem

Paul McGuire paul at alanweberassociates.com
Thu Jul 21 09:52:48 CEST 2005


Liam, Kent, and Danny -

It sure looks like pyparsing is taking on a life of its own!  I can see I no
longer am the only one pitching pyparsing at some of these applications!

Yes, Liam, it is possible to create dictionary-like objects, that is,
ParseResults objects that have named values in them.  I looked into your
application, and the nested assignments seem very similar to a ConfigParse
type of structure.  Here is a pyparsing version that handles the test data
in your original post (I kept Danny Yoo's recursive list values, and added
recursive dictionary entries):

--------------------------
import pyparsing as pp

listValue = pp.Forward()
listSeq = pp.Suppress('{') + pp.Group(pp.ZeroOrMore(listValue)) +
pp.Suppress('}')
listValue << ( pp.dblQuotedString.setParseAction(pp.removeQuotes) | 
                pp.Word(pp.alphanums) | listSeq )

keyName = pp.Word( pp.alphas )

entries = pp.Forward()
entrySeq = pp.Suppress('{') + pp.Group(pp.OneOrMore(entries)) +
pp.Suppress('}')
entries << pp.Dict( 
            pp.OneOrMore( 
                pp.Group( keyName + pp.Suppress('=') + (entrySeq |
listValue) ) ) )
--------------------------


Dict is one of the most confusing classes to use, and there are some
examples in the examples directory that comes with pyparsing (see
dictExample2.py), but it is still tricky.  Here is some code to access your
input test data, repeated here for easy reference:

--------------------------
testdata = """\
country = {
tag = ENG
ai = {
flags = { }
combat = { DAU FRA ORL PRO }
continent = { }
area = { }
region = { "British Isles" "NorthSeaSea" "ECAtlanticSea" "NAtlanticSea"
"TagoSea" "WCAtlanticSea" }
war = 60
ferocity = no
}
}
"""
parsedEntries = entries.parseString(testdata)

def dumpEntries(dct,depth=0):
    keys = dct.keys()
    keys.sort()
    for k in keys:
        print ('  '*depth) + '- ' + k + ':',
        if isinstance(dct[k],pp.ParseResults):
            if dct[k][0].keys():
                print
                dumpEntries(dct[k][0],depth+1)
            else:
                print dct[k][0]
        else:
            print dct[k]

dumpEntries( parsedEntries )

print
print parsedEntries.country[0].tag
print parsedEntries.country[0].ai[0].war
print parsedEntries.country[0].ai[0].ferocity
--------------------------

This will print out:

--------------------------
- country:
  - ai:
    - area: []
    - combat: ['DAU', 'FRA', 'ORL', 'PRO']
    - continent: []
    - ferocity: no
    - flags: []
    - region: ['British Isles', 'NorthSeaSea', 'ECAtlanticSea',
'NAtlanticSea', 'TagoSea', 'WCAtlanticSea']
    - war: 60
  - tag: ENG

ENG
60
No
--------------------------

But I really dislike having to dereference those nested values using the
0'th element.  So I'm going to fix pyparsing so that in the next release,
you'll be able to reference the sub-elements as:

print parsedEntries.country.tag
print parsedEntries.country.ai.war
print parsedEntries.country.ai.ferocity

This *may* break some existing code, but Dict is not heavily used, based on
feedback from users, and this may make it more useful in general, especially
when data parses into nested Dict's.

Hope this sheds more light than confusion!
-- Paul McGuire



More information about the Tutor mailing list