Guido van Rossum wrote:
it would be a good thing if it could, optionally, be made to report horizontal whitespace as well.
It's remarkably easy to get this out of the existing API
sure, but it would be even easier if I didn't have to write that code myself (last time I did that, I needed a couple of tries before the parser handled all cases correctly...). but maybe this could simply be handled by a helper generator in the tokenizer module, that simply wraps the standard tokenizer generator and inserts whitespace tokens where necessary?
keep track of the end position returned by the previous call, and if it's different from the start position returned by the next call, slice the line text from the column positions, assuming the line numbers are the same.If the line numbers differ, something has been eating \n tokens; this shouldn't happen any more with my patch.
you'll still have to deal with multiline strings, right? </F>