[issue12691] tokenize.untokenize is broken

Thu Feb 6 16:50:03 CET 2014

Gareth Rees added the comment:

I did some research on the cause of this issue. The assertion was
added in this change by Jeremy Hylton in August 2006:
<https://mail.python.org/pipermail/python-checkins/2006-August/055812.html>
(The corresponding Mercurial commit is here:
<http://hg.python.org/cpython/rev/cc992d75d5b3#l217.25>).

At that point I believe the assertion was reasonable. I think it would
have been triggered by backslash-continued lines, but otherwise it
worked.

But in this change <http://hg.python.org/cpython/rev/51e24512e305> in
March 2008 Trent Nelson applied this patch by Michael Foord
<http://bugs.python.org/file9741/tokenize_patch.diff> to implement PEP
263 and fix issue719888. The patch added ENCODING tokens to the output
of tokenize.tokenize(). The ENCODING token is always generated with
row number 0, while the first actual token is generated with row
number 1. So now every token stream from tokenize.tokenize() sets off
the assertion.

The lack of a test case for tokenize.untokenize() in "full" mode meant
that it was (and is) all too easy for someone to accidentally break it
like this.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12691>
_______________________________________