At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote:
I've got a small tweak to tokenize.py that I'd like to run by folks here.
I'm working on a refactoring tool for Python 2.x-to-3.x conversion, and my approach is to build a full parse tree with annotations that show where the whitespace and comments go. I use the tokenize module to scan the input. This is nearly perfect (I can render code from the parse tree and it will be an exact match of the input) except for continuation lines -- while the tokenize gives me pseudo-tokens for comments and "ignored" newlines, it doesn't give me the backslashes at all (while it does give me the newline following the backslash).
The following routine will render a token stream, and it automatically restores the missing \'s. I don't know if it'll work with your patch, but perhaps you could use it instead of changing tokenize. For the documentation and examples, see: http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to-... def detokenize(tokens, indent=0): """Convert `tokens` iterable back to a string.""" out = []; add = out.append lr,lc,last = 0,0,'' baseindent = None for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens): # Insert trailing line continuation and blanks for skipped lines lr = lr or sr # first line of input is first line of output if sr>lr: if last: if len(last)>lc: add(last[lc:]) lr+=1 if sr>lr: add(' '*indent + '\\\n'*(sr-lr)) # blank continuation lines lc = 0 # Re-indent first token on line if lc==0: if tok==INDENT: continue # we want to dedent first actual token else: curindent = len(line[:sc].expandtabs()) if baseindent is None and tok not in WHITESPACE: baseindent = curindent elif baseindent is not None and curindent>=baseindent: add(' ' * (curindent-baseindent)) if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE): add(' ' * indent) # Not at start of line, handle intraline whitespace by retaining it elif sc>lc: add(line[lc:sc]) if val: add(val) lr,lc,last = er,ec,line return ''.join(out)