Unrecognized escape sequences in string literals

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Mon Aug 10 04:37:48 EDT 2009


On Mon, 10 Aug 2009 00:37:33 -0700, Carl Banks wrote:

> On Aug 9, 11:10 pm, Steven D'Aprano
> <ste... at REMOVE.THIS.cybersource.com.au> wrote:
>> On Sun, 09 Aug 2009 18:34:14 -0700, Carl Banks wrote:
>> >> Why should a backslash in a string literal be an error?
>>
>> > Because the behavior of \ in a string is context-dependent, which
>> > means a reader can't know if \ is a literal character or escape
>> > character without knowing the context, and it means an innocuous
>> > change in context can cause a rather significant change in \.
>>
>> *Any* change in context is significant with escapes.
>>
>> "this \nhas two lines"
>>
>> If you change the \n to a \t you get a significant difference. If you
>> change the \n to a \y you get a significant difference. Why is the
>> first one acceptable but the second not?
> 
> Because when you change \n to \t, you've haven't changed the meaning of
> the \ character; 

I assume you mean the \ character in the literal, not the (non-existent) 
\ character in the string.


> but when you change \n to \y, you have, and you did so
> without even touching the backslash.

Not at all.

'\n' maps to the string chr(10).
'\y' maps to the string chr(92) + chr(121).

In both cases the backslash in the literal have the same meaning: grab 
the next token (usually a single character, but not always), look it up 
in a mapping somewhere, and insert the result in the string object being 
built.

(I don't know if the *implementation* is precisely as described, but 
that's irrelevant. It's still functionally a mapping.) 



>> > IOW it's an error-prone mess.
>>
>> I've never had any errors caused by this.
> 
> Thank you for your anecdotal evidence.  Here's mine: This has gotten me
> at least twice, and a compiler complaint would have reduced my bug-
> hunting time from tens of minutes to ones of seconds.  [Aside: it was
> when I was using Python on Windows for the first time]

Okay, that's twice in, how many years have you been programming?

I've mistyped "xrange" as "xrnage" two or three times. Does that make 
xrange() "an error-prone mess" too? Probably not. Why is my mistake my 
mistake, but your mistake the language's fault?


[...]

Oh, wait, no, I tell I lie -- I *have* seen people reporting "bugs" here 
caused by backslashes. They're invariably Windows programmers writing 
pathnames using backslashes, so I'll give you that one: if you don't know 
that Python treats backslashes as special in string literals, you will 
screw up your Windows pathnames.

Interestingly, the problem there is not that \y resolves to literal 
backslash followed by y, but that \t DOESN'T resolve to the expected 
backslash-t. So it seems to me that the problem for Windows coders is not 
that \y doesn't raise an error, but the mere existence of backslash 
escapes.



> Someone (obviously not you because you're have perfect knowledge of the
> language and 100% situation awareness at all times) might have a string
> like "abcd\stuv"  and change it to "abcd\tuvw" without even thinking
> about the fact that the s comes after the backslash.

Deary me. And they might type "4+15" instead of "4*51", and now 
arithmetic is an "error-prone mess" too. If you know of a programming 
language which can prevent you making semantic errors, please let us all 
know what it is.

If you edit code without thinking, you will be burnt, and you get *zero* 
sympathy from me.


> Worst of all: they might not even notice the error, because the repr of
> this string is:
> 
> 'abcd\tuwv'
> 
> They might not notice that the backslash is single, because (unlike you)
> mortal fallible human beings don't always register tiny details like a
> backslash being single when it should be double.

"Help help, 123145 looks too similar to 1231145, and now I calculated my 
taxes wrong and will go to jail!!!"


> Point is, this is a very bad inconsistency.  It makes the behavior of \
> impossible to learn by analogy, now you have to memorize a list of
> situations where it behaves one way or another.

No, you don't "have" to memorize anything, you can go right ahead and 
escape every backslash, as I did for years. Your code will still work 
fine.

You already have to memorize what escape codes return special characters. 
The only difference is whether you learn "...and everything else raises 
an exception" or "...and everything else is returned unchanged". 

There is at least one good reason for preferring an error, namely that it 
allows Python to introduce new escape codes without going through a long, 
slow process. But the rest of these complaints are terribly unconvincing.



-- 
Steven



More information about the Python-list mailing list