trouble pyparsing
Paul McGuire
ptmcg at austin.rr._bogus_.com
Wed Jan 4 21:43:55 EST 2006
"the.theorist" <the.theorist at gmail.com> wrote in message
news:1136422425.209587.100500 at z14g2000cwz.googlegroups.com...
> Hey, I'm trying my hand and pyparsing a log file (named l.log):
> FIRSTLINE
>
> PROPERTY1 DATA1
> PROPERTY2 DATA2
>
> PROPERTYS LIST
> ID1 data1
> ID2 data2
>
> ID1 data11
> ID2 data12
>
> SECTION
>
> So I wrote up a small bit of code (named p.py):
> from pyparsing import *
> import sys
>
> toplevel = Forward()
>
> firstLine = Word('FIRSTLINE')
> property = (Word('PROPERTY1') + Word(alphanums)) ^ (Word('PROPERTY2')
> + Word(alphanums))
>
> id = (Word('ID1') + Word(alphanums)) ^ (Word('ID2') +
> Word(alphanums))
> plist = Word('PROPERTYS LIST') + ZeroOrMore( id )
>
> toplevel << firstLine
> toplevel << OneOrMore( property )
> toplevel << plist
>
> par = toplevel
>
> print toplevel.parseFile(sys.argv[1])
>
> The problem is that I get the following error:
<snip>
> Is this a fundamental error, or is it just me? (I haven't yet tried
> simpleparse)
>
It's you.
Well, let's focus on the behavior and not the individual. There are two
major misconceptions that you have here:
1. Confusing "Word" for "Literal"
2. Confusing "<<" Forward assignment for some sort of C++ streaming
operator.
What puzzles me is that in some places, you correctly use the Word class, as
in Word(alphanums), to indicate a "word" as a contiguous set of characters
found in the string alphanums. You also correctly use '+' to build up id
and plist expressions, but then you use "<<" successively in what looks like
streaming into the toplevel variable.
When your grammar includes Word("FIRSTLINE"), you are actually saying you
want to match a "word" composed of one ore more letters found in the string
"FIRSTLINE" - this would match not only FIRSTLINE, but also FIRST, LINE,
LIRST, FINE, LIST, FIST, FLINTSTRINE, well, you get the idea. Just the way
Word(alphanums) matches DATA1, DATA2, data1, data2, data11, and data12.
What you really want here is the class Literal, as in Literal("FIRSTLINE").
As for toplevel, there is no reason here to use Forward() - reserve use of
this class for recursive structures, such as lists composed of lists, etc.
toplevel is simply the sequence of a firstline, OneOrMore properties, and a
plist, which is just the plain old:
toplevel = firstline + OneOrMore(property) + plist
Lastly, if you'll peruse the documentation that comes with pyparsing, you'll
also find the Group class. This class is very helpful in imparting some
structure to the returned set of tokens.
Here is a before/after version of your program, that has some more
successful results.
-- Paul
data = """FIRSTLINE
PROPERTY1 DATA1
PROPERTY2 DATA2
PROPERTYS LIST
ID1 data1
ID2 data2
ID1 data11
ID2 data12
SECTION
"""
from pyparsing import *
import sys
#~ toplevel = Forward()
#~ firstLine = Word('FIRSTLINE')
firstLine = Literal('FIRSTLINE')
#~ property = (Word('PROPERTY1') + Word(alphanums)) ^ (Word('PROPERTY2') +
Word(alphanums))
property = (Literal('PROPERTY1') + Word(alphanums)) ^ (Literal('PROPERTY2')
+ Word(alphanums))
#~ id = (Word('ID1') + Word(alphanums)) ^ (Word('ID2') +
Word(alphanums))
id = (Literal('ID1') + Word(alphanums)) ^ (Literal('ID2') +
Word(alphanums))
#~ plist = Word('PROPERTYS LIST') + ZeroOrMore( id )
plist = Literal('PROPERTYS LIST') + ZeroOrMore( id )
#~ toplevel << firstLine
#~ toplevel << OneOrMore( property )
#~ toplevel << plist
toplevel = firstLine + OneOrMore( property ) + plist
par = toplevel
print par.parseString(data)
# add Groups, to give structure to results, rather than just returning a
flat list of strings
plist = Literal('PROPERTYS LIST') + ZeroOrMore( Group(id) )
toplevel = firstLine + Group(OneOrMore(Group(property))) + Group(plist)
par = toplevel
print par.parseString(data)
More information about the Python-list
mailing list