[Python-Dev] Small tweak to tokenize.py?
Guido van Rossum
guido at python.org
Thu Nov 30 19:28:25 CET 2006
Are you opposed changing tokenize? If so, why (apart from
compatibility)? ISTM that it would be a good thing if it reported
everything except horizontal whitespace.
On 11/30/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote:
> >I've got a small tweak to tokenize.py that I'd like to run by folks here.
> >
> >I'm working on a refactoring tool for Python 2.x-to-3.x conversion,
> >and my approach is to build a full parse tree with annotations that
> >show where the whitespace and comments go. I use the tokenize module
> >to scan the input. This is nearly perfect (I can render code from the
> >parse tree and it will be an exact match of the input) except for
> >continuation lines -- while the tokenize gives me pseudo-tokens for
> >comments and "ignored" newlines, it doesn't give me the backslashes at
> >all (while it does give me the newline following the backslash).
>
> The following routine will render a token stream, and it automatically
> restores the missing \'s. I don't know if it'll work with your patch, but
> perhaps you could use it instead of changing tokenize. For the
> documentation and examples, see:
>
> http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to-text
>
>
> def detokenize(tokens, indent=0):
> """Convert `tokens` iterable back to a string."""
> out = []; add = out.append
> lr,lc,last = 0,0,''
> baseindent = None
> for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens):
> # Insert trailing line continuation and blanks for skipped lines
> lr = lr or sr # first line of input is first line of output
> if sr>lr:
> if last:
> if len(last)>lc:
> add(last[lc:])
> lr+=1
> if sr>lr:
> add(' '*indent + '\\\n'*(sr-lr)) # blank continuation lines
> lc = 0
>
> # Re-indent first token on line
> if lc==0:
> if tok==INDENT:
> continue # we want to dedent first actual token
> else:
> curindent = len(line[:sc].expandtabs())
> if baseindent is None and tok not in WHITESPACE:
> baseindent = curindent
> elif baseindent is not None and curindent>=baseindent:
> add(' ' * (curindent-baseindent))
> if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE):
> add(' ' * indent)
>
> # Not at start of line, handle intraline whitespace by retaining it
> elif sc>lc:
> add(line[lc:sc])
>
> if val:
> add(val)
>
> lr,lc,last = er,ec,line
>
> return ''.join(out)
>
>
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list