On Tue, Apr 13, 2021 at 12:55 PM Serhiy Storchaka email@example.com wrote:
26.04.18 21:37, Serhiy Storchaka пише:
In Python 2.5 `0or` was accepted by the Python parser. It became an error in 2.6 because "0o" became recognizing as an incomplete octal number. `1or` still is accepted.
On other hand, `1if 2else 3` is accepted despites the fact that "2e" can be recognized as an incomplete floating point number. In this case the tokenizer pushes "e" back and returns "2".
Shouldn't it do the same with "0o"? It is possible to make `0or` be parseable again. Python implementation is able to tokenize this example:
$ echo '0or' | ./python -m tokenize 1,0-1,1: NUMBER '0' 1,1-1,3: NAME 'or' 1,3-1,4: OP '[' 1,4-1,5: OP ']' 1,5-1,6: NEWLINE '\n' 2,0-2,0: ENDMARKER ''
On other hand, all these examples look weird. There is an assymmetry: `1or 2` is a valid syntax, but `1 or2` is not. It is hard to recognize visually the boundary between a number and the following identifier or keyword, especially if numbers can contain letters ("b", "e", "j", "o", "x") and underscores, and identifiers can contain digits. On both sides of the boundary can be letters, digits, and underscores.
I propose to change the Python syntax by adding a requirement that there should be a whitespace or delimiter between a numeric literal and the following keyword.
New example was found recently (see https://bugs.python.org/issue43833).
[0x1for x in (1,2)]
It is parsed as [0x1f or x in (1,2)] instead of [0x1 for x in (1,2)].
Since this code is clearly ambiguous, it makes more sense to emit a SyntaxWarning if there is no space between number and identifier.
I would totally make that a SyntaxError, and backwards compatibility be damned.