Tokenizer inconsistency wrt to new lines in comments

Kay Schluehr kay.schluehr at gmx.net
Fri Apr 4 21:18:48 CEST 2008


On 4 Apr., 18:22, George Sakkis <george.sak... at gmail.com> wrote:
> The tokenize.generate_tokens function seems to handle in a context-
> sensitive manner the new line after a comment:
>
> >>> from StringIO import StringIO
> >>> from tokenize import generate_tokens
>
> >>> text = '''
>
> ... # hello world
> ... x = (
> ... # hello world
> ... )
> ... '''
>
> >>> for t in generate_tokens(StringIO(text).readline):
>
> ...     print repr(t[1])
> ...
> '\n'
> '# hello world\n'
> 'x'
> '='
> '('
> '\n'
> '# hello world'
> '\n'
> ')'
> '\n'
> ''
>
> Is there a reason that the newline is included in the first comment
> but not in the second, or is it a bug ?
>
> George

I guess it's just an artifact of handling line continuations within
expressions where a different rule is applied. For compilation
purposes both the newlines within expressions as well as the comments
are irrelevant. There are even two different token namely NEWLINE and
NL which are produced for newlines. NL and COMMENT will be ignored.
NEWLINE is relevant for the parser.

If it was a bug it has to violate a functional requirement. I can't
see which one.

Kay



More information about the Python-list mailing list