[Python-Dev] Small tweak to tokenize.py?
Phillip J. Eby
pje at telecommunity.com
Thu Nov 30 19:22:57 CET 2006
At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote:
>I've got a small tweak to tokenize.py that I'd like to run by folks here.
>
>I'm working on a refactoring tool for Python 2.x-to-3.x conversion,
>and my approach is to build a full parse tree with annotations that
>show where the whitespace and comments go. I use the tokenize module
>to scan the input. This is nearly perfect (I can render code from the
>parse tree and it will be an exact match of the input) except for
>continuation lines -- while the tokenize gives me pseudo-tokens for
>comments and "ignored" newlines, it doesn't give me the backslashes at
>all (while it does give me the newline following the backslash).
The following routine will render a token stream, and it automatically
restores the missing \'s. I don't know if it'll work with your patch, but
perhaps you could use it instead of changing tokenize. For the
documentation and examples, see:
http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to-text
def detokenize(tokens, indent=0):
"""Convert `tokens` iterable back to a string."""
out = []; add = out.append
lr,lc,last = 0,0,''
baseindent = None
for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens):
# Insert trailing line continuation and blanks for skipped lines
lr = lr or sr # first line of input is first line of output
if sr>lr:
if last:
if len(last)>lc:
add(last[lc:])
lr+=1
if sr>lr:
add(' '*indent + '\\\n'*(sr-lr)) # blank continuation lines
lc = 0
# Re-indent first token on line
if lc==0:
if tok==INDENT:
continue # we want to dedent first actual token
else:
curindent = len(line[:sc].expandtabs())
if baseindent is None and tok not in WHITESPACE:
baseindent = curindent
elif baseindent is not None and curindent>=baseindent:
add(' ' * (curindent-baseindent))
if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE):
add(' ' * indent)
# Not at start of line, handle intraline whitespace by retaining it
elif sc>lc:
add(line[lc:sc])
if val:
add(val)
lr,lc,last = er,ec,line
return ''.join(out)
More information about the Python-Dev
mailing list