New subject: Small tweak to tokenize.py?

Nov. 30, 2006

      At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote:
...
I've got a small tweak to tokenize.py that I'd like to run by folks here.
I'm working on a refactoring tool for Python 2.x-to-3.x conversion,
and my approach is to build a full parse tree with annotations that
show where the whitespace and comments go. I use the tokenize module
to scan the input. This is nearly perfect (I can render code from the
parse tree and it will be an exact match of the input) except for
continuation lines -- while the tokenize gives me pseudo-tokens for
comments and "ignored" newlines, it doesn't give me the backslashes at
all (while it does give me the newline following the backslash).
The following routine will render a token stream, and it automatically 
restores the missing \'s.  I don't know if it'll work with your patch, but 
perhaps you could use it instead of changing tokenize.  For the 
documentation and examples, see:

http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to-...

def detokenize(tokens, indent=0):
     """Convert `tokens` iterable back to a string."""
     out = []; add = out.append
     lr,lc,last = 0,0,''
     baseindent = None
     for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens):
         # Insert trailing line continuation and blanks for skipped lines
         lr = lr or sr   # first line of input is first line of output
         if sr>lr:
             if last:
                 if len(last)>lc:
                     add(last[lc:])
                 lr+=1
             if sr>lr:
                 add(' '*indent + '\\\n'*(sr-lr))    # blank continuation lines
             lc = 0

         # Re-indent first token on line
         if lc==0:
             if tok==INDENT:
                 continue  # we want to dedent first actual token
             else:
                 curindent = len(line[:sc].expandtabs())
                 if baseindent is None and tok not in WHITESPACE:
                     baseindent = curindent
                 elif baseindent is not None and curindent>=baseindent:
                     add(' ' * (curindent-baseindent))
                 if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE):
                     add(' ' * indent)

         # Not at start of line, handle intraline whitespace by retaining it
         elif sc>lc:
             add(line[lc:sc])

         if val:
             add(val)

         lr,lc,last = er,ec,line

     return ''.join(out)

Re: [Python-Dev] Small tweak to tokenize.py?

Phillip J. Eby

Guido van Rossum

Fredrik Lundh

Guido van Rossum

Fredrik Lundh

Guido van Rossum

tags

participants (3)