[Python-ideas] Python octal escape character encoding "wats"

Joao S. O. Bueno jsbueno at python.org.br
Fri Nov 9 21:04:22 EST 2018


On Fri, 9 Nov 2018 at 23:56, Chris Angelico <rosuav at gmail.com> wrote:
> >>> list("\797")
> ['\x07', '9', '7']

> The octal escape grabs as many digits as it can, and when it finds a
> character in the literal that isn't a valid octal digit (same whether
> it's a '9' or a 'q'), it stops. The remaining characters have no
> special meaning; this does not become four hex digits. A "\xNN" escape
> in Python must be exactly two digits, no more and no less.

Yes-  I had just figured this out before going to sleep, and was
comming back that
although strange, this was no motive for breaking stuff up.

Thank your for the lengthy reply!!

>
> On Sat, Nov 10, 2018 at 12:42 PM Joao S. O. Bueno <jsbueno at python.org.br> wrote:
> >
> > I just saw some document which reminded me that strings with a
> > backslash followed by 3 octal digits. When a backslash is followed by
> > 3 octal digits, that means a character with the corresponding
> > codepoint and all is well.
> >
> > The "valid scenaario":
> >
> > In [42]: "\777"
> > Out[42]: 'ǿ'
> >
> > The problem is when you have just two valid octal digits
> >
> > In [40]: "\778"
> > Out[40]: '?8'
> >
> > Which is ambiguous at least -- why is this not "\x07" "77" for
> > example?  (0ct(77) actually corresponds to the "?" (63) character)
>
> Not ambiguous. It takes as many valid octal digits as it can.
>
> https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
>
> \ooo ==> Character with octal value ooo
> Note 1: As in Standard C, up to three octal digits are accepted.
>
> "Up to" means that one or two digits can also define a character. For
> obvious reasons, it has to take digits greedily (otherwise "\777"
> would be "\x07" followed by "77"), and it's not an error to have fewer
> digits. Permitting a single digit means that "\0" means the NUL
> character, which is often convenient.
>
> > And then when the second digit is not valid octal:
> > In [43]: "\797"
> > Out[43]: '\x0797'
> > WAT?
> >
> > So, between the possibly ambiguous scenario with two octal digits
> > followed by a no-octal digit, and   the complety unexpected expansion
> > to a 4-hexadecimal digit codepoint in the last case
>
> You may possibly be misinterpreting the last result. It's exactly the
> same as the previous ones.
>
> >>> list("\797")
> ['\x07', '9', '7']
>
> The octal escape grabs as many digits as it can, and when it finds a
> character in the literal that isn't a valid octal digit (same whether
> it's a '9' or a 'q'), it stops. The remaining characters have no
> special meaning; this does not become four hex digits. A "\xNN" escape
> in Python must be exactly two digits, no more and no less.
>
> > what do you say
> > of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3
> > octal digits, and yield a syntax error for that from Python 3.9 (or
> > 3.10) on?
>
> Nope. Would break code for no good reason.
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/


More information about the Python-ideas mailing list