Unrecognized escape sequences in string literals
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Mon Aug 10 04:37:48 EDT 2009
On Mon, 10 Aug 2009 00:37:33 -0700, Carl Banks wrote:
> On Aug 9, 11:10 pm, Steven D'Aprano
> <ste... at REMOVE.THIS.cybersource.com.au> wrote:
>> On Sun, 09 Aug 2009 18:34:14 -0700, Carl Banks wrote:
>> >> Why should a backslash in a string literal be an error?
>>
>> > Because the behavior of \ in a string is context-dependent, which
>> > means a reader can't know if \ is a literal character or escape
>> > character without knowing the context, and it means an innocuous
>> > change in context can cause a rather significant change in \.
>>
>> *Any* change in context is significant with escapes.
>>
>> "this \nhas two lines"
>>
>> If you change the \n to a \t you get a significant difference. If you
>> change the \n to a \y you get a significant difference. Why is the
>> first one acceptable but the second not?
>
> Because when you change \n to \t, you've haven't changed the meaning of
> the \ character;
I assume you mean the \ character in the literal, not the (non-existent)
\ character in the string.
> but when you change \n to \y, you have, and you did so
> without even touching the backslash.
Not at all.
'\n' maps to the string chr(10).
'\y' maps to the string chr(92) + chr(121).
In both cases the backslash in the literal have the same meaning: grab
the next token (usually a single character, but not always), look it up
in a mapping somewhere, and insert the result in the string object being
built.
(I don't know if the *implementation* is precisely as described, but
that's irrelevant. It's still functionally a mapping.)
>> > IOW it's an error-prone mess.
>>
>> I've never had any errors caused by this.
>
> Thank you for your anecdotal evidence. Here's mine: This has gotten me
> at least twice, and a compiler complaint would have reduced my bug-
> hunting time from tens of minutes to ones of seconds. [Aside: it was
> when I was using Python on Windows for the first time]
Okay, that's twice in, how many years have you been programming?
I've mistyped "xrange" as "xrnage" two or three times. Does that make
xrange() "an error-prone mess" too? Probably not. Why is my mistake my
mistake, but your mistake the language's fault?
[...]
Oh, wait, no, I tell I lie -- I *have* seen people reporting "bugs" here
caused by backslashes. They're invariably Windows programmers writing
pathnames using backslashes, so I'll give you that one: if you don't know
that Python treats backslashes as special in string literals, you will
screw up your Windows pathnames.
Interestingly, the problem there is not that \y resolves to literal
backslash followed by y, but that \t DOESN'T resolve to the expected
backslash-t. So it seems to me that the problem for Windows coders is not
that \y doesn't raise an error, but the mere existence of backslash
escapes.
> Someone (obviously not you because you're have perfect knowledge of the
> language and 100% situation awareness at all times) might have a string
> like "abcd\stuv" and change it to "abcd\tuvw" without even thinking
> about the fact that the s comes after the backslash.
Deary me. And they might type "4+15" instead of "4*51", and now
arithmetic is an "error-prone mess" too. If you know of a programming
language which can prevent you making semantic errors, please let us all
know what it is.
If you edit code without thinking, you will be burnt, and you get *zero*
sympathy from me.
> Worst of all: they might not even notice the error, because the repr of
> this string is:
>
> 'abcd\tuwv'
>
> They might not notice that the backslash is single, because (unlike you)
> mortal fallible human beings don't always register tiny details like a
> backslash being single when it should be double.
"Help help, 123145 looks too similar to 1231145, and now I calculated my
taxes wrong and will go to jail!!!"
> Point is, this is a very bad inconsistency. It makes the behavior of \
> impossible to learn by analogy, now you have to memorize a list of
> situations where it behaves one way or another.
No, you don't "have" to memorize anything, you can go right ahead and
escape every backslash, as I did for years. Your code will still work
fine.
You already have to memorize what escape codes return special characters.
The only difference is whether you learn "...and everything else raises
an exception" or "...and everything else is returned unchanged".
There is at least one good reason for preferring an error, namely that it
allows Python to introduce new escape codes without going through a long,
slow process. But the rest of these complaints are terribly unconvincing.
--
Steven
More information about the Python-list
mailing list