[Python-ideas] Python octal escape character encoding "wats"

Steven D'Aprano steve at pearwood.info
Fri Nov 9 23:19:09 EST 2018


On Sat, Nov 10, 2018 at 12:56:07PM +1100, Chris Angelico wrote:

> Not ambiguous. It takes as many valid octal digits as it can.

What is the rationale for that? Hex escapes don't.

My guess is, "Because that's what C does". And C probably does it 
because "Dennis Ritchie wanted to minimize the number of keypresses when 
he was typing" :-)


> "Up to" means that one or two digits can also define a character. For
> obvious reasons, it has to take digits greedily (otherwise "\777"
> would be "\x07" followed by "77"), and it's not an error to have fewer
> digits.

In hindsight, I think we should have insisted that octal escapes must 
always be three digits, just as hex escapes are always two. The status 
quo has too much magical "Do What I Mean" in it for my liking:

py> '\509\51'  # pair of brackets surrounding a nine
'(9)'
py> '\507\51'  # pair of brackets surrounding a seven
'G)'

Dammit Python, that's not what I meant!


> > what do you say
> > of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3
> > octal digits, and yield a syntax error for that from Python 3.9 (or
> > 3.10) on?
> 
> Nope. Would break code for no good reason.

There's a good reason: to make the behaviour more sensible and less 
confusing and have fewer "oops, that's not what I wanted" bugs. But we 
should have made that change for 3.0. Now, I agree: it would be breakage 
where the benefit doesn't outweigh the cost.

Maybe in Python 5000.

In the meantime, one or two digit octal escapes ought to be a linter 
warning.



-- 
Steve


More information about the Python-ideas mailing list