[Python-Dev] Boundaries between numbers and identifiers

Serhiy Storchaka storchaka at gmail.com
Thu Apr 26 14:37:10 EDT 2018


In Python 2.5 `0or[]` was accepted by the Python parser. It became an 
error in 2.6 because "0o" became recognizing as an incomplete octal 
number. `1or[]` still is accepted.

On other hand, `1if 2else 3` is accepted despites the fact that "2e" can 
be recognized as an incomplete floating point number. In this case the 
tokenizer pushes "e" back and returns "2".

Shouldn't it do the same with "0o"? It is possible to make `0or[]` be 
parseable again. Python implementation is able to tokenize this example:

$ echo '0or[]' | ./python -m tokenize
1,0-1,1:            NUMBER         '0'
1,1-1,3:            NAME           'or'
1,3-1,4:            OP             '['
1,4-1,5:            OP             ']'
1,5-1,6:            NEWLINE        '\n'
2,0-2,0:            ENDMARKER      ''

On other hand, all these examples look weird. There is an assymmetry: 
`1or 2` is a valid syntax, but `1 or2` is not. It is hard to recognize 
visually the boundary between a number and the following identifier or 
keyword, especially if numbers can contain letters ("b", "e", "j", "o", 
"x") and underscores, and identifiers can contain digits. On both sides 
of the boundary can be letters, digits, and underscores.

I propose to change the Python syntax by adding a requirement that there 
should be a whitespace or delimiter between a numeric literal and the 
following keyword.



More information about the Python-Dev mailing list