Any help with PLY?
ptmcg at austin.rr._bogus_.com
Thu Nov 17 20:52:59 CET 2005
<mark.green at reading.ac.uk> wrote in message
news:1132253408.676406.179100 at g43g2000cwa.googlegroups.com...
> Hi folks,
> I've been trying to write a PLY parser and have run into a bit of
> At the moment, I have a RESERVEDWORD token which matches all reserved
> words and then alters the token type to match the reserved word that
> was detected. I also have an IDENTIFIER token which matches
> identifiers that are not reserved words.
> The problem is, if I put RESERVEDWORD before IDENTIFIER, then
> identifiers that happen to begin with reserved words are wrongly lexed
> as the reserved word followed by an identifier. For example, because
> "if" is a RESERVEDWORD, the string "ifollowyou" is wrongly lexed as the
> RESERVEDWORD "if" followed by IDENTIFIER "ollowyou", rather than just
> as the IDENTIFIER "ifollowyou".
> If I put IDENTIFIER first, though, every single reserved word in the
> input is lexed as an IDENTIFIER.
> Is there any way I can tell PLY that it should only return a
> RESERVEDWORD in the correct circumstances? If PLY can't do this, can
> any of the other Python parser generators? (It seems that Lex can..)
Pyparsing uses the Keyword class for just this purpose. Before Keyword was
added to pyparsing, one had to solve this problem using the Or operator,
which performs a longest string or "greedy" match, as in :
any_ = Literal("any")
boolean_ = Literal("boolean")
char_ = Literal("char")
double_ = Literal("double")
identifier = Word( alphas, alphanums + "_" ).setName("identifier")
real = Combine( Word(nums+"+-", nums) + dot + Optional( Word(nums) )
+ Optional( CaselessLiteral("E") +
Word(nums+"+-",nums) ) )
integer = ( Combine( CaselessLiteral("0x") + Word(
nums+"abcdefABCDEF" ) ) |
Word( nums+"+-", nums ) ).setName("int")
udTypeName = delimitedList( identifier, "::",
# have to use longest match for type, in case a user-defined
# type name starts with a keyword type, like "stringSeq" or
typeName = ( any_ ^ boolean_ ^ char_ ^ double_ ^ fixed_ ^
float_ ^ long_ ^ octet_ ^ short_ ^ string_ ^
wchar_ ^ wstring_ ^ udTypeName )
This way, if a user-defined type was named "stringSequence" the longest
matching expression would be returned.
Pyparsing also has a MatchFirst alternative matcher, using the '|' operator,
which returns the first matching expression regardless of length.
Predictably, MatchFirst is faster at parsing, since it does not need to
evaluate every path - it can just return the first matching expression. Now
with Keyword, I can define:
any_ = Keyword("any")
boolean_ = Keyword("boolean")
char_ = Keyword("char")
double_ = Keyword("double")
typeName = ( any_ | boolean_ | char_ | double_ | fixed_ |
float_ | long_ | octet_ | short_ | string_ |
wchar_ | wstring_ | udTypeName )
Does PLY support greedy matching?
(Download pyparsing at http://pyparsing.sourceforge.net .)
More information about the Python-list