On Fri, 9 Nov 2018 at 23:56, Chris Angelico <rosuav@gmail.com> wrote:
list("\797") ['\x07', '9', '7']
The octal escape grabs as many digits as it can, and when it finds a character in the literal that isn't a valid octal digit (same whether it's a '9' or a 'q'), it stops. The remaining characters have no special meaning; this does not become four hex digits. A "\xNN" escape in Python must be exactly two digits, no more and no less.
Yes- I had just figured this out before going to sleep, and was comming back that although strange, this was no motive for breaking stuff up. Thank your for the lengthy reply!!
On Sat, Nov 10, 2018 at 12:42 PM Joao S. O. Bueno <jsbueno@python.org.br> wrote:
I just saw some document which reminded me that strings with a backslash followed by 3 octal digits. When a backslash is followed by 3 octal digits, that means a character with the corresponding codepoint and all is well.
The "valid scenaario":
In [42]: "\777" Out[42]: 'ǿ'
The problem is when you have just two valid octal digits
In [40]: "\778" Out[40]: '?8'
Which is ambiguous at least -- why is this not "\x07" "77" for example? (0ct(77) actually corresponds to the "?" (63) character)
Not ambiguous. It takes as many valid octal digits as it can.
https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-l...
\ooo ==> Character with octal value ooo Note 1: As in Standard C, up to three octal digits are accepted.
"Up to" means that one or two digits can also define a character. For obvious reasons, it has to take digits greedily (otherwise "\777" would be "\x07" followed by "77"), and it's not an error to have fewer digits. Permitting a single digit means that "\0" means the NUL character, which is often convenient.
And then when the second digit is not valid octal: In [43]: "\797" Out[43]: '\x0797' WAT?
So, between the possibly ambiguous scenario with two octal digits followed by a no-octal digit, and the complety unexpected expansion to a 4-hexadecimal digit codepoint in the last case
You may possibly be misinterpreting the last result. It's exactly the same as the previous ones.
list("\797") ['\x07', '9', '7']
The octal escape grabs as many digits as it can, and when it finds a character in the literal that isn't a valid octal digit (same whether it's a '9' or a 'q'), it stops. The remaining characters have no special meaning; this does not become four hex digits. A "\xNN" escape in Python must be exactly two digits, no more and no less.
what do you say of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3 octal digits, and yield a syntax error for that from Python 3.9 (or 3.10) on?
Nope. Would break code for no good reason.
ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/