[Python-ideas] Python octal escape character encoding "wats"
Steven D'Aprano
steve at pearwood.info
Fri Nov 9 23:19:09 EST 2018
On Sat, Nov 10, 2018 at 12:56:07PM +1100, Chris Angelico wrote:
> Not ambiguous. It takes as many valid octal digits as it can.
What is the rationale for that? Hex escapes don't.
My guess is, "Because that's what C does". And C probably does it
because "Dennis Ritchie wanted to minimize the number of keypresses when
he was typing" :-)
> "Up to" means that one or two digits can also define a character. For
> obvious reasons, it has to take digits greedily (otherwise "\777"
> would be "\x07" followed by "77"), and it's not an error to have fewer
> digits.
In hindsight, I think we should have insisted that octal escapes must
always be three digits, just as hex escapes are always two. The status
quo has too much magical "Do What I Mean" in it for my liking:
py> '\509\51' # pair of brackets surrounding a nine
'(9)'
py> '\507\51' # pair of brackets surrounding a seven
'G)'
Dammit Python, that's not what I meant!
> > what do you say
> > of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3
> > octal digits, and yield a syntax error for that from Python 3.9 (or
> > 3.10) on?
>
> Nope. Would break code for no good reason.
There's a good reason: to make the behaviour more sensible and less
confusing and have fewer "oops, that's not what I wanted" bugs. But we
should have made that change for 3.0. Now, I agree: it would be breakage
where the benefit doesn't outweigh the cost.
Maybe in Python 5000.
In the meantime, one or two digit octal escapes ought to be a linter
warning.
--
Steve
More information about the Python-ideas
mailing list