lexical analysis of python

Paul McGuire ptmcg at austin.rr.com
Wed Mar 11 01:36:50 EDT 2009


On Mar 10, 8:53 pm, robert.mull... at gmail.com wrote:
> I understand the method, but when you say you "count one DEDENT for
> each level"
> well lets say you counted 3 of them. Do you have a way to interject 3
> consecutive
> DEDENT tokens into the token stream so that the parser receives them
> before it
> receives the next real token?

Pyparsing makes *heavy* use of the program stack for keeping the
current parsing state in local variables.  By the time I am 3 levels
deep in indentation, I am also nested a corresponding depth in the
program stack.  The indent stack is kept separately, as a global var.
Each INDENT causes a recursive nested call.  When I DEDENT, I unwind
only one level, and return from the corresponding INDENT - at this
time I can push a DEDENT token on the return stack (it so happens I
*don't* do this by default, I just close off the current statement
group - but a user could define a parse action/callback to push DEDENT
tokens).  Then the next INDENT checks the indent stack, sees that it
too has dedented, and another DEDENT token gets pushed, and so on.  So
the key, I guess, is that there is no iterative popping of indent
levels from the indent stack, each recursive INDENT/DEDENT handler
push/pops its own level, pushing INDENT and DEDENT values
appropriately.

Just FYI, as I said, pyparsing *doesn't* push explicit INDENT/DEDENT
tokens.  Instead it returns a nested list of lists representing the
corresponding structure of program statements.  Here is a similar
treatment for an expression of nested parentheses:

print pyparsing.nestedExpr().parseString("(a b (c d e)(f g)h (i(j)))")

Prints:
[['a', 'b', ['c', 'd', 'e'], ['f', 'g'], 'h', ['i', ['j']]]]

(Note - this string representation looks like a normal Python list,
but parseString returns a ParseResults object, a rich results
structure which supports list, dict, and object attribute access
methods.)

pyparsing does a sort of "mixed mode" of simultaneous lexing and
parsing, which deviates from traditional lex/yacc-like separation of
duties.

-- Paul



More information about the Python-list mailing list