In Python 2.5 0or[]
was accepted by the Python parser. It became an
error in 2.6 because "0o" became recognizing as an incomplete octal
number. 1or[]
still is accepted.
On other hand, 1if 2else 3
is accepted despites the fact that "2e" can
be recognized as an incomplete floating point number. In this case the
tokenizer pushes "e" back and returns "2".
Shouldn't it do the same with "0o"? It is possible to make 0or[]
be
parseable again. Python implementation is able to tokenize this example:
$ echo '0or[]' | ./python -m tokenize 1,0-1,1: NUMBER '0' 1,1-1,3: NAME 'or' 1,3-1,4: OP '[' 1,4-1,5: OP ']' 1,5-1,6: NEWLINE '\n' 2,0-2,0: ENDMARKER ''
On other hand, all these examples look weird. There is an assymmetry:
1or 2
is a valid syntax, but 1 or2
is not. It is hard to
recognize
visually the boundary between a number and the following identifier or
keyword, especially if numbers can contain letters ("b", "e", "j", "o",
"x") and underscores, and identifiers can contain digits. On both sides
of the boundary can be letters, digits, and underscores.
I propose to change the Python syntax by adding a requirement that there should be a whitespace or delimiter between a numeric literal and the following keyword.
On Apr 26, 2018, at 11:37 AM, Serhiy Storchaka storchaka@gmail.com wrote:
I propose to change the Python syntax by adding a requirement that there should be a whitespace or delimiter between a numeric literal and the following keyword.
-1
This would make Python 3.8 reject code due to stylistic preference. Code that it actually can unambiguously parse today.
I agree that a formatting style that omits whitespace between numerals and other tokens is terrible. However, if you start downright rejecting it, you will likely punish the wrong people. Users of third-party libraries will be met with random parsing errors in files they have no control over. This is not helpful.
And given BPO-33338 the standard library tokenizer would have to keep parsing those things as is.
Making 0or[] working again is also not worth it since that's been broken since Python 2.6 and hopefully nobody is running Python 2.5-only code anymore.
What we should instead is to make the standard library tokenizer reflect the behavior of Python 2.6+.
-- Ł
26.04.18 22:02, Lukasz Langa пише:
On Apr 26, 2018, at 11:37 AM, Serhiy Storchaka storchaka@gmail.com wrote:
I propose to change the Python syntax by adding a requirement that there should be a whitespace or delimiter between a numeric literal and the following keyword. -1
This would make Python 3.8 reject code due to stylistic preference. Code that it actually can unambiguously parse today.
Of course I don't propose to make it a syntax error in 3.8. It should first emit a SyntaxWarning and be converted into an error only in 3.10.
Or maybe first add a rule for this in PEP 8 and make it a syntax error in distant future, after all style checkers include it.
I agree that a formatting style that omits whitespace between numerals and other tokens is terrible. However, if you start downright rejecting it, you will likely punish the wrong people. Users of third-party libraries will be met with random parsing errors in files they have no control over. This is not helpful.
And given BPO-33338 the standard library tokenizer would have to keep parsing those things as is.
Making 0or[] working again is also not worth it since that's been broken since Python 2.6 and hopefully nobody is running Python 2.5-only code anymore.
What we should instead is to make the standard library tokenizer reflect the behavior of Python 2.6+.
The behavior of the standard library tokenizer doesn't contradict rules.
It is the most natural behavior of regex-based tokenizer. Actually the
behavior of the building tokenizer can be incorrect. In any case
accepting 1if 2else 3
and rejecting 0or[]
looks weird. They
should
use the same rule. "0or" and "2else" should be considered ambiguous or
unambiguous in the same way.