Unrecognized escape sequences in string literals

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Mon Aug 10 02:03:05 EDT 2009


On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote:

> Steven D'Aprano wrote:
> 
>> Why should a backslash in a string literal be an error?
> 
> Because in Python, if my friend sees the string "foo\xbar\n", he has no
> idea whether the "\x" is an escape sequence, or if it is just the
> characters "\x", unless he looks it up in the manual, or tries it out in
> the REPL, or what have you. 

Fair enough, but isn't that just another way of saying that if you look 
at a piece of code and don't know what it does, you don't know what it 
does unless you look it up or try it out?


> My friend is adamant that it would be better
> if he could just look at the string literal and know. He doesn't want to
> be bothered to have to store stuff like that in his head. He wants to be
> able to figure out programs just by looking at them, to the maximum
> degree that that is feasible.

I actually sympathize strongly with that attitude. But, honestly, your 
friend is a programmer (or at least pretends to be one *wink*). You can't 
be a programmer without memorizing stuff: syntax, function calls, modules 
to import, quoting rules, blah blah blah. Take C as an example -- there's 
absolutely nothing about () that says "group expressions or call a 
function" and {} that says "group a code block". You just have to 
memorize it. If you don't know what a backslash escape is going to do, 
why would you use it? I'm sure your friend isn't in the habit of randomly 
adding backslashes to strings just to see whether it will still compile.

This is especially important when reading (as opposed to writing) code. 
You read somebody else's code, and see "foo\xbar\n". Let's say you know 
it compiles without warning. Big deal -- you don't know what the escape 
codes do unless you've memorized them. What does \n resolve to? chr(13) 
or chr(97) or chr(0)? Who knows? 

Unless you know the rules, you have no idea what is in the string. 
Allowing \y to resolve to a literal backslash followed by y doesn't 
change that. All it means is that some \c combinations return a single 
character, and some return two.



> In comparison to Python, in C++, he can just look "foo\xbar\n" and know
> that "\x" is a special character. (As long as it compiles without
> warnings under g++.)

So what you mean is, he can just look at "foo\xbar\n" AND COMPILE IT 
USING g++, and know whether or not \x is a special character.

[sarcasm] Gosh. That's an enormous difference from Python, where you have 
to print the string at the REPL to know what it does. [/sarcasm]

Aside:
\x isn't a special character:

>>> "\x"
ValueError: invalid \x escape

However, \xba is:

>>> "\xba"
'\xba'
>>> len("\xba")
1
>>> ord("\xba")
186



> He's particularly annoyed too, that if he types "foo\xbar" at the REPL,
> it echoes back as "foo\\xbar". He finds that to be some sort of annoying
> DWIM feature, and if Python is going to have DWIM features, then it
> should, for example, figure out what he means by "\" and not bother him
> with a syntax error in that case.

Now your friend is confused. This is a good thing. Any backslash you see 
in Python's default string output is *always* an escape:

>>> "a string with a 'proper' escape \t (tab)"
"a string with a 'proper' escape \t (tab)"
>>> "a string with an 'improper' escape \y (backslash-y)"
"a string with an 'improper' escape \\y (backslash-y)"

The REPL is actually doing him a favour. It always escapes backslashes, 
so there is no ambiguity. A backslash is displayed as \\, any other \c is 
a special character.


> Of course I think that he's overreacting a bit. 

:)


> My point of view is that
> every language has *some* warts; Python just has a bit fewer than most.
> It would have been nice, I should think, if this wart had been "fixed"
> in Python 3, as I do consider it to be a minor wart.

And if anyone had cared enough to raise it a couple of years back, it 
possibly might have been.


-- 
Steven



More information about the Python-list mailing list