[issue2180] tokenize: mishandles line joining

Meador Inge report at bugs.python.org
Thu Sep 8 03:39:11 CEST 2011


Meador Inge <meadori at gmail.com> added the comment:

That syntax error is coming from the CPython parser and *not* the tokenizer.  Both CPython and the 'tokenizer' modules produce the same tokenization:

[meadori at motherbrain cpython]$ cat repro.py
if 1:
  \

  pass
[meadori at motherbrain cpython]$ ./python tokenize.py repro.py 
0,0-0,0:        ENCODING        'utf-8'
1,0-1,2:        NAME            'if'
1,3-1,4:        NUMBER          '1'
1,4-1,5:        OP              ':'
1,5-1,6:        NEWLINE         '\n'
2,0-2,2:        INDENT          '  '
3,0-3,1:        NEWLINE         '\n'
4,2-4,6:        NAME            'pass'
4,6-4,7:        NEWLINE         '\n'
5,0-5,0:        DEDENT          ''
5,0-5,0:        ENDMARKER       ''
[44319 refs]
[meadori at motherbrain cpython]$ ./python -d repro.py | grep Token | tail -10
  File "repro.py", line 3
    
    ^
SyntaxError: invalid syntax
[44305 refs]
Token NEWLINE/'' ... It's a token we know
Token DEDENT/'' ... It's a token we know
Token NEWLINE/'' ... It's a token we know
Token ENDMARKER/'' ... It's a token we know
Token NAME/'if' ... It's a keyword
Token NUMBER/'1' ... It's a token we know
Token COLON/':' ... It's a token we know
Token NEWLINE/'' ... It's a token we know
Token INDENT/'' ... It's a token we know
Token NEWLINE/'' ... It's a token we know

The NEWLINE INDENT NEWLINE tokenization causes the parser to choke because 'suite' nonterminals:

suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT

are defined as NEWLINE INDENT.

It seems appropriate that the NEWLINE after INDENT should be dropped by both tokenizers.  In other words, I think:
"""
if 1:
  \

  pass
"""

should produce the same tokenization as:

"""
if 1:
  
  pass
"""

This seems consistent with with how explicit line joining is defined [2].


[1] http://hg.python.org/cpython/file/92842e347d98/Grammar/Grammar
[2] http://docs.python.org/reference/lexical_analysis.html#explicit-line-joining

----------
stage: test needed -> needs patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2180>
_______________________________________


More information about the Python-bugs-list mailing list