steve at holdenweb.com
Fri Jan 7 06:36:31 EST 2005
Andrew Dalke wrote:
> Bengt Richter:
>>But it does look ahead to recognize += (i.e., it doesn't generate two
>>successive also-legal tokens of '+' and '=')
>>so it seems it should be a simple fix.
> But that works precisely because of the greedy nature of tokenization.
> Given "a+=2" the longest token it finds first is "a" because "a+"
> is not a valid token. The next token is "+=". It isn't just "+"
> because "+=" is valid. And the last token is "2".
You're absolutely right, of course, Andrew, and personally I don't think
that this is worth trying to fix. But the original post I responded to
was suggesting that an LL(1) grammar couldn't disambiguate "1." and
"1..3", which assertion relied on a slight fuzzing of the lines between
lexical and syntactical analysis that I didn't want to leave unsharpened.
The fact that Python's existing tokenizer doesn't allow multi-character
tokens beginning with a dot after a digit (roughly speaking) is what
makes the whole syntax proposal infeasibly hard to adapt to.
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 494 3119
More information about the Python-list