[docs] [issue35297] untokenize documentation is not correct

Thu Nov 22 16:25:02 EST 2018

New submission from Zsolt Cserna <cserna.zsolt at gmail.com>:

untokenize documentation (https://docs.python.org/3/library/tokenize.html#tokenize.untokenize) states the following:

"""
Converts tokens back into Python source code. The iterable must return sequences with at least two elements, the token type and the token string. Any additional sequence elements are ignored.
"""

This last sentence is clearly not true because here:
https://github.com/python/cpython/blob/master/Lib/tokenize.py#L242

The code checks for the length of the input token there, and the code behaves differently, in terms of whitespace, when an iterator of 2-tuples are given and when there are more elements in the tuple. When there are more elements in the tuple, the function renders whitespaces as the same as they were present in the original source.

So this code:
tokenize.untokenize(tokenize.tokenize(source.readline))

And this:
tokenize.untokenize([x[:2] for x in tokenize.tokenize(source.readline)])

Have different results.

I don't know that it is a documentation issue  or a bug in the module itself, so I created this bugreport to seek for assistance in this regard.

----------
assignee: docs at python
components: Documentation
messages: 330281
nosy: csernazs, docs at python
priority: normal
severity: normal
status: open
title: untokenize documentation is not correct
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35297>
_______________________________________