[Python-ideas] Python octal escape character encoding "wats"
Joao S. O. Bueno
jsbueno at python.org.br
Fri Nov 9 20:41:34 EST 2018
I just saw some document which reminded me that strings with a
backslash followed by 3 octal digits. When a backslash is followed by
3 octal digits, that means a character with the corresponding
codepoint and all is well.
The "valid scenaario":
In [42]: "\777"
Out[42]: 'ǿ'
The problem is when you have just two valid octal digits
In [40]: "\778"
Out[40]: '?8'
Which is ambiguous at least -- why is this not "\x07" "77" for
example? (0ct(77) actually corresponds to the "?" (63) character)
Or...when the first digit is not valid as octal - that is:
In [41]: "\877"
Out[41]: '\\877'
And then when the second digit is not valid octal:
In [43]: "\797"
Out[43]: '\x0797'
WAT?
So, between the possibly ambiguous scenario with two octal digits
followed by a no-octal digit, and the complety unexpected expansion
to a 4-hexadecimal digit codepoint in the last case, what do you say
of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3
octal digits, and yield a syntax error for that from Python 3.9 (or
3.10) on?
Best regards,
js
-><-
More information about the Python-ideas
mailing list