[Python-ideas] Make non-meaningful backslashes illegal in string literals

Fri Aug 7 07:12:01 CEST 2015

On Thu, Aug 06, 2015 at 12:26:14PM -0400, random832 at fastmail.us wrote:
> On Wed, Aug 5, 2015, at 14:56, Eric V. Smith wrote:
> > Because strings containing \{ are currently valid
> 
> Which raises the question of why.

Because \C is currently valid, for all values of C. The idea is that if 
you typo an escape, say \d for \f, you get an obvious backslash in your 
string which is easy to spot.

Personally, I think that's a mistake. It leads to errors like this:

filename = 'C:\some\path\something.txt'

silently doing the wrong thing. If we're going to change the way escapes 
work, it's time to deprecate the misfeature that \C is a literal 
backslash followed by C. Outside of raw strings, a backslash should 
*only* be allowed in an escape sequence.

Deprecating invalid escape sequences would then open the door to adding 
new, useful escapes.

> (and as long as we're talking about
> things to deprecate in string literals, how about \v?)

Why would you want to deprecate a useful and long-standing escape 
sequence? Admittedly \v isn't as common as \t or \n, but it still has 
its uses, and is a standard escape familiar to anyone who uses C, C++, 
C#, Octave, Haskell, Javascript, etc.

If we're going to make major changes to the way escapes work, I'd rather 
add new escapes, not take them away:

\e escape \x1B, as supported by gcc and clang;

the escaping rules from Haskell:

http://book.realworldhaskell.org/read/characters-strings-and-escaping-rules.html

\P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)

\U+xxxx Unicode code point U+xxxx (with four to six hex digits)

It's much nicer to be able to write Unicode code points that (apart from 
the backslash) look like the standard Unicode notation U+0000 to 
U+10FFFF, rather than needing to pad to a full eight digits as the 
\U00xxxxxx syntax requires.

-- 
Steve