[Python-Dev] Lexical analysis and NEWLINE tokens

Thu Oct 6 14:36:51 CEST 2005

I posted this question to python-help, but I think I have a better chance
of getting the answer here.

I'm looking for clarification on when NEWLINE tokens are generated during
lexical analysis of Python source code.  In particular, I'm confused about
some of the top-level components in Python's grammar (file_input,
interactive_input, and eval_input).

Section 2.1.7 of the reference manual states that blank lines (lines
consisting only of whitespace and possibly a comment) do not generate
NEWLINE tokens.  This is supported by the definition of a suite, which
does not allow for standalone or consecutive NEWLINE tokens.

    suite ::= stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT

Yet the grammar for top-level components seems to suggest that a parsable
input may consist entirely of a single NEWLINE token, or include
consecutive NEWLINE tokens.

    file_input ::= (NEWLINE | statement)*
    interactive_input ::= [stmt_list] NEWLINE | compound_stmt NEWLINE
    eval_input ::= expression_list NEWLINE*

To me this seems to contradict section 2.1.7 in so far as I don't see how
it's possible to generate such a sequence of tokens.

What kind of input would generate NEWLINE tokens in the top-level
components of the grammar?

Matthew Barnes
matthew at barnes.net