[Python-ideas] Make non-meaningful backslashes illegal in string literals
Steven D'Aprano
steve at pearwood.info
Fri Aug 7 07:12:01 CEST 2015
On Thu, Aug 06, 2015 at 12:26:14PM -0400, random832 at fastmail.us wrote:
> On Wed, Aug 5, 2015, at 14:56, Eric V. Smith wrote:
> > Because strings containing \{ are currently valid
>
> Which raises the question of why.
Because \C is currently valid, for all values of C. The idea is that if
you typo an escape, say \d for \f, you get an obvious backslash in your
string which is easy to spot.
Personally, I think that's a mistake. It leads to errors like this:
filename = 'C:\some\path\something.txt'
silently doing the wrong thing. If we're going to change the way escapes
work, it's time to deprecate the misfeature that \C is a literal
backslash followed by C. Outside of raw strings, a backslash should
*only* be allowed in an escape sequence.
Deprecating invalid escape sequences would then open the door to adding
new, useful escapes.
> (and as long as we're talking about
> things to deprecate in string literals, how about \v?)
Why would you want to deprecate a useful and long-standing escape
sequence? Admittedly \v isn't as common as \t or \n, but it still has
its uses, and is a standard escape familiar to anyone who uses C, C++,
C#, Octave, Haskell, Javascript, etc.
If we're going to make major changes to the way escapes work, I'd rather
add new escapes, not take them away:
\e escape \x1B, as supported by gcc and clang;
the escaping rules from Haskell:
http://book.realworldhaskell.org/read/characters-strings-and-escaping-rules.html
\P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)
\U+xxxx Unicode code point U+xxxx (with four to six hex digits)
It's much nicer to be able to write Unicode code points that (apart from
the backslash) look like the standard Unicode notation U+0000 to
U+10FFFF, rather than needing to pad to a full eight digits as the
\U00xxxxxx syntax requires.
--
Steve
More information about the Python-ideas
mailing list