tokenize.untokenize adding line continuation characters
Rotwang
sg552 at hotmail.co.uk
Mon Jan 16 17:42:43 EST 2017
Here's something odd I've found with the tokenize module: tokenizing 'if x:\n y' and then untokenizing the result adds '\\\n' to the end. Attempting to tokenize the result again fails because of the backslash continuation with nothing other than a newline after it. On the other hand, if the original string ends with a newline then it works fine. Can anyone explain why this happens?
I'm using Python 3.4.3 on Windows 8. Copypasted from iPython:
import tokenize, io
tuple(tokenize.tokenize(io.BytesIO('if x:\n y'.encode()).readline))
Out[2]:
(TokenInfo(type=56 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''),
TokenInfo(type=1 (NAME), string='if', start=(1, 0), end=(1, 2), line='if x:\n'),
TokenInfo(type=1 (NAME), string='x', start=(1, 3), end=(1, 4), line='if x:\n'),
TokenInfo(type=52 (OP), string=':', start=(1, 4), end=(1, 5), line='if x:\n'),
TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 5), end=(1, 6), line='if x:\n'),
TokenInfo(type=5 (INDENT), string=' ', start=(2, 0), end=(2, 4), line=' y'),
TokenInfo(type=1 (NAME), string='y', start=(2, 4), end=(2, 5), line=' y'),
TokenInfo(type=6 (DEDENT), string='', start=(3, 0), end=(3, 0), line=''),
TokenInfo(type=0 (ENDMARKER), string='', start=(3, 0), end=(3, 0), line=''))
tokenize.untokenize(_).decode()
Out[3]: 'if x:\n y\\\n'
tuple(tokenize.tokenize(io.BytesIO(_.encode()).readline))
---------------------------------------------------------------------------
TokenError Traceback (most recent call last)
<ipython-input-4-6bd8f83c1114> in <module>()
----> 1 tuple(tokenize.tokenize(io.BytesIO(_.encode()).readline))
C:\Program Files\Python34\lib\tokenize.py in _tokenize(readline, encoding)
558 else: # continued statement
559 if not line:
--> 560 raise TokenError("EOF in multi-line statement", (lnum, 0))
561 continued = 0
562
TokenError: ('EOF in multi-line statement', (3, 0))
More information about the Python-list
mailing list