[Python-ideas] Python octal escape character encoding "wats"

Joao S. O. Bueno jsbueno at python.org.br
Fri Nov 9 20:41:34 EST 2018


I just saw some document which reminded me that strings with a
backslash followed by 3 octal digits. When a backslash is followed by
3 octal digits, that means a character with the corresponding
codepoint and all is well.

The "valid scenaario":

In [42]: "\777"
Out[42]: 'ǿ'

The problem is when you have just two valid octal digits

In [40]: "\778"
Out[40]: '?8'

Which is ambiguous at least -- why is this not "\x07" "77" for
example?  (0ct(77) actually corresponds to the "?" (63) character)

Or...when the first digit is not valid as octal - that is:
In [41]: "\877"
Out[41]: '\\877'

And then when the second digit is not valid octal:
In [43]: "\797"
Out[43]: '\x0797'
WAT?

So, between the possibly ambiguous scenario with two octal digits
followed by a no-octal digit, and   the complety unexpected expansion
to a 4-hexadecimal digit codepoint in the last case, what do you say
of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3
octal digits, and yield a syntax error for that from Python 3.9 (or
3.10) on?

Best regards,

    js
  -><-


More information about the Python-ideas mailing list