[ python-Bugs-1224621 ] tokenize module does not detect inconsistent dedents

Mon Aug 14 23:34:30 CEST 2006

Bugs item #1224621, was opened at 2005-06-21 06:10
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1224621&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 7
Submitted By: Danny Yoo (dyoo)
Assigned to: Raymond Hettinger (rhettinger)
Summary: tokenize module does not detect inconsistent dedents

Initial Comment:
The attached code snippet 'testcase.py' should produce an 
IndentationError, but does not.  The code in tokenize.py is too 
trusting, and needs to add a check against bad indentation as it 
yields DEDENT tokens.

I'm including a diff to tokenize.py that should at least raise an 
exception on bad indentation like this.

Just in case, I'm including testcase.py here too:
------
import tokenize
from StringIO import StringIO
sampleBadText = """
def foo():
    bar
  baz
"""
print list(tokenize.generate_tokens(
    StringIO(sampleBadText).readline))

----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2006-08-14 21:34

Message:
Logged In: YES 
user_id=849994

tabnanny's been taken care of in r51284.

----------------------------------------------------------------------

Comment By: Kurt B. Kaiser (kbk)
Date: 2006-08-10 01:40

Message:
Logged In: YES 
user_id=149084

Tokenize Rev 39046 21Jun05 breaks tabnanny.

tabnanny doesn't handle the IndentationError exception
when tokenize detects a dedent.

I patched up ScriptBinding.py in IDLE.  The 
IndentationError probably should pass the same parms as
TokenError and tabnanny should catch it.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2005-09-02 12:40

Message:
Logged In: YES 
user_id=4771

Here is a proposed patch.  It relaxes the dedent policy a
bit.  It assumes that the first line may already have some
initial indentation, as is the case when tokenizing from the
middle of a file (as inspect.getsource() does).

It should also be back-ported to 2.4, given that the
previous patch was.  For 2.4, only the non-test part of the
patch applies cleanly; I suggest to ignore the test part and
just apply it, given that there are much more tests in 2.5
for inspect.getsource() anyway.

The whole issue of inspect.getsource() being muddy anyway, I
will go ahead and check this patch in unless someone spots a
problem.  For now the previously-applied patch makes parts
of PyPy break with an uncaught IndentationError.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2005-09-02 12:10

Message:
Logged In: YES 
user_id=4771

Reopening this bug report: this might fix the problem at
hand, but it breaks inspect.getsource() on cases where it
used to work.  See attached example.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2005-06-21 07:54

Message:
Logged In: YES 
user_id=80475

Fixed.  
See Lib/tokenize.py 1.38 and 1.36.4.1

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1224621&group_id=5470