[Python-Dev] Small tweak to tokenize.py?

Phillip J. Eby pje at telecommunity.com
Thu Nov 30 19:22:57 CET 2006


At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote:
>I've got a small tweak to tokenize.py that I'd like to run by folks here.
>
>I'm working on a refactoring tool for Python 2.x-to-3.x conversion,
>and my approach is to build a full parse tree with annotations that
>show where the whitespace and comments go. I use the tokenize module
>to scan the input. This is nearly perfect (I can render code from the
>parse tree and it will be an exact match of the input) except for
>continuation lines -- while the tokenize gives me pseudo-tokens for
>comments and "ignored" newlines, it doesn't give me the backslashes at
>all (while it does give me the newline following the backslash).

The following routine will render a token stream, and it automatically 
restores the missing \'s.  I don't know if it'll work with your patch, but 
perhaps you could use it instead of changing tokenize.  For the 
documentation and examples, see:

http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to-text


def detokenize(tokens, indent=0):
     """Convert `tokens` iterable back to a string."""
     out = []; add = out.append
     lr,lc,last = 0,0,''
     baseindent = None
     for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens):
         # Insert trailing line continuation and blanks for skipped lines
         lr = lr or sr   # first line of input is first line of output
         if sr>lr:
             if last:
                 if len(last)>lc:
                     add(last[lc:])
                 lr+=1
             if sr>lr:
                 add(' '*indent + '\\\n'*(sr-lr))    # blank continuation lines
             lc = 0

         # Re-indent first token on line
         if lc==0:
             if tok==INDENT:
                 continue  # we want to dedent first actual token
             else:
                 curindent = len(line[:sc].expandtabs())
                 if baseindent is None and tok not in WHITESPACE:
                     baseindent = curindent
                 elif baseindent is not None and curindent>=baseindent:
                     add(' ' * (curindent-baseindent))
                 if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE):
                     add(' ' * indent)

         # Not at start of line, handle intraline whitespace by retaining it
         elif sc>lc:
             add(line[lc:sc])

         if val:
             add(val)

         lr,lc,last = er,ec,line

     return ''.join(out)



More information about the Python-Dev mailing list