[Python-Dev] Boundaries between numbers and identifiers
Serhiy Storchaka
storchaka at gmail.com
Thu Apr 26 14:37:10 EDT 2018
In Python 2.5 `0or[]` was accepted by the Python parser. It became an
error in 2.6 because "0o" became recognizing as an incomplete octal
number. `1or[]` still is accepted.
On other hand, `1if 2else 3` is accepted despites the fact that "2e" can
be recognized as an incomplete floating point number. In this case the
tokenizer pushes "e" back and returns "2".
Shouldn't it do the same with "0o"? It is possible to make `0or[]` be
parseable again. Python implementation is able to tokenize this example:
$ echo '0or[]' | ./python -m tokenize
1,0-1,1: NUMBER '0'
1,1-1,3: NAME 'or'
1,3-1,4: OP '['
1,4-1,5: OP ']'
1,5-1,6: NEWLINE '\n'
2,0-2,0: ENDMARKER ''
On other hand, all these examples look weird. There is an assymmetry:
`1or 2` is a valid syntax, but `1 or2` is not. It is hard to recognize
visually the boundary between a number and the following identifier or
keyword, especially if numbers can contain letters ("b", "e", "j", "o",
"x") and underscores, and identifiers can contain digits. On both sides
of the boundary can be letters, digits, and underscores.
I propose to change the Python syntax by adding a requirement that there
should be a whitespace or delimiter between a numeric literal and the
following keyword.
More information about the Python-Dev
mailing list