[issue23297] Clarify error when ‘tokenize.detect_encoding’ receives text
Berker Peksag <berker.peksag@gmail.com> added the comment: The original problem has already been solved by making tokenize.generate_tokens() public in issue 12486. However, the same exception can be raised when tokenize.open() is used with tokenize.tokenize(), because it returns a text stream: https://github.com/python/cpython/blob/da63b321f63b697f75e7ab2f88f55d907f56c... hello.py -------- def say_hello(): print("Hello, World!") say_hello() text.py ------- import tokenize with tokenize.open('hello.py') as f: token_gen = tokenize.tokenize(f.readline) for token in token_gen: print(token) When we pass f.readline to tokenize.tokenize(), the second call to detect_encoding() fails, because f.readline() returns str. In Lib/test/test_tokenize.py, it seems like tokenize.open() is only tested to open a file. Its output isn't passed to tokenize.tokenize(). Most of the tests either pass the readline() method of open(..., 'rb') or io.BytesIO() to tokenize.tokenize(). I will submit a documentation PR that suggests to use tokenize.generate_tokens() with tokenize.open(). ---------- assignee: -> docs@python components: +Documentation nosy: +docs@python versions: +Python 3.7, Python 3.8 -Python 3.5, Python 3.6 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23297> _______________________________________
Ben Finney <ben+python@benfinney.id.au> added the comment: On 28-Apr-2019, Berker Peksag wrote:
The original problem has already been solved by making tokenize.generate_tokens() public in issue 12486.
I don't understand how that would affect the resolution of this issue. Isn't the correct resolution here going to entail correct implementation in ‘file.readline’? ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23297> _______________________________________
participants (2)
-
Ben Finney
-
Berker Peksag