Any help with PLY?
Paul McGuire
ptmcg at austin.rr._bogus_.com
Thu Nov 17 14:52:59 EST 2005
<mark.green at reading.ac.uk> wrote in message
news:1132253408.676406.179100 at g43g2000cwa.googlegroups.com...
> Hi folks,
>
> I've been trying to write a PLY parser and have run into a bit of
> bother.
>
> At the moment, I have a RESERVEDWORD token which matches all reserved
> words and then alters the token type to match the reserved word that
> was detected. I also have an IDENTIFIER token which matches
> identifiers that are not reserved words.
>
> The problem is, if I put RESERVEDWORD before IDENTIFIER, then
> identifiers that happen to begin with reserved words are wrongly lexed
> as the reserved word followed by an identifier. For example, because
> "if" is a RESERVEDWORD, the string "ifollowyou" is wrongly lexed as the
> RESERVEDWORD "if" followed by IDENTIFIER "ollowyou", rather than just
> as the IDENTIFIER "ifollowyou".
>
> If I put IDENTIFIER first, though, every single reserved word in the
> input is lexed as an IDENTIFIER.
>
> Is there any way I can tell PLY that it should only return a
> RESERVEDWORD in the correct circumstances? If PLY can't do this, can
> any of the other Python parser generators? (It seems that Lex can..)
>
> Thanks!
>
Pyparsing uses the Keyword class for just this purpose. Before Keyword was
added to pyparsing, one had to solve this problem using the Or operator,
which performs a longest string or "greedy" match, as in :
any_ = Literal("any")
boolean_ = Literal("boolean")
char_ = Literal("char")
double_ = Literal("double")
...
identifier = Word( alphas, alphanums + "_" ).setName("identifier")
real = Combine( Word(nums+"+-", nums) + dot + Optional( Word(nums) )
+ Optional( CaselessLiteral("E") +
Word(nums+"+-",nums) ) )
integer = ( Combine( CaselessLiteral("0x") + Word(
nums+"abcdefABCDEF" ) ) |
Word( nums+"+-", nums ) ).setName("int")
udTypeName = delimitedList( identifier, "::",
combine=True ).setName("udType")
# have to use longest match for type, in case a user-defined
# type name starts with a keyword type, like "stringSeq" or
"longArray"
typeName = ( any_ ^ boolean_ ^ char_ ^ double_ ^ fixed_ ^
float_ ^ long_ ^ octet_ ^ short_ ^ string_ ^
wchar_ ^ wstring_ ^ udTypeName )
This way, if a user-defined type was named "stringSequence" the longest
matching expression would be returned.
Pyparsing also has a MatchFirst alternative matcher, using the '|' operator,
which returns the first matching expression regardless of length.
Predictably, MatchFirst is faster at parsing, since it does not need to
evaluate every path - it can just return the first matching expression. Now
with Keyword, I can define:
any_ = Keyword("any")
boolean_ = Keyword("boolean")
char_ = Keyword("char")
double_ = Keyword("double")
...
typeName = ( any_ | boolean_ | char_ | double_ | fixed_ |
float_ | long_ | octet_ | short_ | string_ |
wchar_ | wstring_ | udTypeName )
Does PLY support greedy matching?
-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net .)
More information about the Python-list
mailing list