[New-bugs-announce] [issue42974] tokenize reports incorrect end col offset and line string when input ends without explicit newline
Brian Romanowski
report at bugs.python.org
Tue Jan 19 22:13:57 EST 2021
New submission from Brian Romanowski <romanows at gmail.com>:
The tokenize module's tokenizer functions output incorrect (or at least misleading) information when the content being tokenized does not end in a line ending character. This is related to the fix for issue<33899> which added the NEWLINE tokens for this case but did not fill out the whole token tuple correctly.
The bug can be seen by running a version of the test in Lib/test/test_tokenize.py:
import io, tokenize
newline_token_1 = list(tokenize.tokenize(io.BytesIO("x\n".encode('utf-8')).readline))[-2]
newline_token_2 = list(tokenize.tokenize(io.BytesIO("x".encode('utf-8')).readline))[-2]
print(newline_token_1)
print(newline_token_2)
# Prints:
# TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 1), end=(1, 2), line='x\n')
# TokenInfo(type=4 (NEWLINE), string='', start=(1, 1), end=(1, 2), line='') # bad "end" and "line"!
Notice that "len(newline_token_2.string) == 0" but "newline_token_2.end[1] - newline_token_2.start[1] == 1". Seems more consistent if the newline_token_2.end == (1, 1).
Also, newline_token_2.line should hold the physical line rather than the empty string. This would make it consistent with newline_token_1.line.
I'll add a PR shortly with a change so the output from the two cases is:
TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 1), end=(1, 2), line='x\n')
TokenInfo(type=4 (NEWLINE), string='', start=(1, 1), end=(1, 1), line='x')
If this looks reasonable, I can backport it for the other branches. Thanks!
----------
components: Library (Lib)
messages: 385313
nosy: romanows
priority: normal
severity: normal
status: open
title: tokenize reports incorrect end col offset and line string when input ends without explicit newline
versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42974>
_______________________________________
More information about the New-bugs-announce
mailing list