Mailman 3 Make undefined escape sequences have SyntaxWarnings - Python-ideas

newer
Floating point contexts in Python...

Make undefined escape sequences have SyntaxWarnings

Mike Graham

Oct. 10, 2012

7:36 p.m.

The literal"\c" should be an error but in practice means "\\c". It's probably too late to make this invalid syntax as it out to be, but I wonder if a warning isn't in order, especially with the theoretical potential of adding new string escapes in the future.

Show replies by date

Antoine Pitrou

October 2012

7:46 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, 10 Oct 2012 15:36:08 -0400 Mike Graham <mikegraham@gmail.com> wrote:

...

-1. This will make life more difficult with regular expressions (and produce lots of spurious warnings in existing code). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Serhiy Storchaka

8:04 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 10.10.12 22:46, Antoine Pitrou wrote:

...

-1. This will make life more difficult with regular expressions (and produce lots of spurious warnings in existing code).

Strings for regular expressions always should be raw. Now regular expressions supports \u and \U escapes and no reason to use non-raw strings.

Antoine Pitrou

8:18 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, 10 Oct 2012 23:04:25 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:

...

That's a style issue, not a language rule. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Serhiy Storchaka

9 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 10.10.12 23:18, Antoine Pitrou wrote:

...

Yes, of course, that's a style advice. Sorry if I used the wrong words. This will not make life more difficult with regular expressions because you always can use raw string literals.

Yuval Greenfield

10:31 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

http://docs.python.org/release/3.3.0/reference/lexical_analysis.html#string-... I'm not sure I understand what this line from the docs means: \newline Backslash and newline ignored I understand that row as either "\n" won't appear in the resulting string or that I should get "\\newline". Yuval Greenfield

Serhiy Storchaka

10:53 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 11.10.12 13:31, Yuval Greenfield wrote:

...

Newline is newline in source code.

...

Type <Quote><Key "a"><Backslash><Enter><Key "b"><Quote>. Result is "ab".

Steven D'Aprano

12:07 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 11/10/12 07:04, Serhiy Storchaka wrote:

...

Why? The re module doesn't care how you construct the strings. It *can't* care how you construct the strings. Something like re.search('\D*', 'abcd1234xyz') works perfectly well and there is no need for a raw string. Any requirement to "always use raw strings" is a style issue, not a language issue. -- Steven

Mike Graham

8:08 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Regular expressions are difficult if you're remembering which escape sequences exist and are easy if you're using raw string literals. Mike

Antoine Pitrou

8:16 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, 10 Oct 2012 16:08:22 -0400 Mike Graham <mikegraham@gmail.com> wrote:

...

That's a misconception, since as the re docs mention: “Most of the standard escapes supported by Python string literals are also accepted by the regular expression parser: [snip]” http://docs.python.org/dev/library/re.html In other words, whether you put "\t" or "\\t" in a regexp doesn't matter: it means the same to the regexp engine. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Mike Graham

8:45 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, Oct 10, 2012 at 4:16 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

I'm not sure what misconception you're saying I have. An example of when you have to remember what the escapes are is

...

Mike

Steven D'Aprano

12:08 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 11/10/12 07:08, Mike Graham wrote:

...

On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou<solipsis@pitrou.net> wrote:

...
On Wed, 10 Oct 2012 15:36:08 -0400 Mike Graham<mikegraham@gmail.com> wrote:

...

...
...
The literal"\c" should be an error

Who says so? My bash shell disagrees with you: [steve@ando ~]$ touch spam [steve@ando ~]$ ls s\pa\m spam and so do I. There are three obvious behaviours for extraneous escapes: 1) backslash-c resolves to just c (what bash and VisualStudio do) 2) backslash-c resolves to backslash-c (what Python does) 3) raise an exception or compile-time error (what Java does) It is undefined behaviour in C. It is a matter of opinion that Java got it right and the others got it wrong, one which I do not share.

...

...
...
but in practice means "\\c". It's probably too late to make this invalid syntax as it out to be, but I wonder if a warning isn't in order, especially with the theoretical potential of adding new string escapes in the future.

-1. This will make life more difficult with regular expressions (and produce lots of spurious warnings in existing code).

I agree with Antoine here. If and when there is a serious, concrete proposal to add a new string escape, and not just a "theoretical potential", then we should consider adding warnings.

...

Regular expressions are difficult if you're remembering which escape sequences exist and are easy if you're using raw string literals.

Just because some people find it hard to remember doesn't mean that it should be an error *not* to use raw strings. -- Steven

Mike Graham

2:24 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, Oct 10, 2012 at 8:08 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Frankly, I don't look to bash for sensible language design advice. I think concepts like "In the face of ambiguity, refuse the temptation to guess" guides how we should see the decision here. "Backslash is for escape sequences except when it's not" seemed like an obviously-misfortunate thing to me. I'm truly perplexed people see it as a feature they're eager to use, but I guess I should learn something from that.

...

I didn't say that it should be an error not to use raw strings. I was saying that the implication that this suggestion makes constructing regex strings hard is silly and mentioning the thing that makes them easy. I'm not suggesting that you shouldn't be able to use normal string literals. Antoine went on to point out that things like "\t" worked in regex strings. This is an unrelated feature that I never suggested altering. In that case, a tab character in your string is regarded like \t. This behavior would remain. I think four string escapes have been added since versions of Python I was aware of. Writing code like "ab\c" seems seedy in light of that Mike

Steven D'Aprano

2:58 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 11/10/12 13:24, Mike Graham wrote:

...

On Wed, Oct 10, 2012 at 8:08 PM, Steven D'Aprano<steve@pearwood.info> wrote:

...
On 11/10/12 07:08, Mike Graham wrote:

...
On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou<solipsis@pitrou.net> wrote:

...
On Wed, 10 Oct 2012 15:36:08 -0400 Mike Graham<mikegraham@gmail.com> wrote:

...
...
...
The literal"\c" should be an error

Who says so? My bash shell disagrees with you:

Frankly, I don't look to bash for sensible language design advice.

Pity, because in this case I think bash is actually more sensible than either Python or Java. If you escape a character, you should get something. If it's a special character, you get the special meaning. If it's not, escaping should be transparent: escaping something that doesn't need escaping is a null op: py> from urllib import quote_plus py> quote_plus('abc') 'abc' If we were designing Python from scratch, I'd prefer '\D' -> 'D'. But we're not, so I'm happy with the current behaviour, and don't agree that it should be an error or that it needs warning about.

...

I think concepts like "In the face of ambiguity, refuse the temptation to guess" guides how we should see the decision here.

Where is the ambiguity? Is there ever a context where \D could mean two different things and it isn't clear which one? "In the face of ambiguity..." does not mean "refuse to decide on language behaviour". Everything is ambiguous until you decide what something will mean. It's only when you have two possible meanings and no clear, obvious way to determine which one applies that the ambiguity koan applies.

...

"Backslash is for escape sequences except when it's not" seemed like an obviously-misfortunate thing to me.

No. In cooked strings, backslash-C is always an escape sequence, for any character (or hex/oct code) C. But some escape sequences resolve to a single char (\n -> newline) and some resolve to a pair of chars (\D -> backslash D). In Haskell, \& resolves to the empty string. It's still an escape sequence. [...]

...

I think four string escapes have been added since versions of Python I was aware of. Writing code like "ab\c" seems seedy in light of that

Adding a new escape sequence is almost as big a step as adding a new built-in or new syntax. I see that as a good thing, it discourages too many requests for new escape sequences. -- Steven

Greg Ewing

5:34 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

Steven D'Aprano wrote:

...

I think that calling "\n", "\t" etc. "escape sequences" is a misnomer that is causing confusion in this discussion. The term "escape" in this context means to prevent something from having a special meaning that it would otherwise have. But the backslash in these is being used to *give* a special meaning to the following character. In Python string literals, the only true escape sequences associated with the backslash are '\\', "\'" and '\"'. So the backslash is a bit schizophrenic -- sometimes it's an escape character, sometimes it's a prefix that imparts a special meaning. This means that "\c" where c is not special in any way is somewhat ambiguous. Are you redundantly escaping something that doesn't need it, are you asking for a special meaning that doesn't exist (which is probably a mistake), or do you just want a literal backslash? Python guesses that you want a literal backslash. This seems to be motivated by the desire to minimise the need for backslash doubling. That sounds fine in theory, but I don't think it helps much in practice. I for one don't trust myself to keep the entire set of special characters in my head, including all the rarely-used ones, so I end up doubling every backslash anyway. Given that, I wouldn't have minded at all if Python had refused to guess in this case, and raised a compile-time error. That would have left the way open for extending the set of special chars in the future.

...

I don't see it makes much difference. We get plenty of requests for new syntax of all kinds, and we seem to have enough sense to reject them unless they're backed by extremely good arguments. There's no reason requests for new special chars should be treated any differently. -- Greg

MRAB

8:11 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 2012-10-10 20:46, Antoine Pitrou wrote:

...

How would it make life more difficult with regular expressions? I would've preferred: 1. Unknown escapes in string literals give a compile-time error 2. Raw string literals treat backslashes as pure literals 3. Unknown escapes in regex patterns give a run-time error Unfortunately, changing them would break existing code. (I retain the behaviour of re in the regex module for this reason, not that I like it. :-() It would've been nice if the 'fix' had been made in Python 3...

Antoine Pitrou

October 2012

7:46 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, 10 Oct 2012 15:36:08 -0400 Mike Graham <mikegraham@gmail.com> wrote:

...

Serhiy Storchaka

8:04 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 10.10.12 22:46, Antoine Pitrou wrote:

...

-1. This will make life more difficult with regular expressions (and produce lots of spurious warnings in existing code).

Strings for regular expressions always should be raw. Now regular expressions supports \u and \U escapes and no reason to use non-raw strings.

Antoine Pitrou

8:18 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, 10 Oct 2012 23:04:25 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:

...

That's a style issue, not a language rule. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Serhiy Storchaka

9 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 10.10.12 23:18, Antoine Pitrou wrote:

...

Yes, of course, that's a style advice. Sorry if I used the wrong words. This will not make life more difficult with regular expressions because you always can use raw string literals.

Yuval Greenfield

10:31 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

Serhiy Storchaka

10:53 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 11.10.12 13:31, Yuval Greenfield wrote:

...

Newline is newline in source code.

...

Type <Quote><Key "a"><Backslash><Enter><Key "b"><Quote>. Result is "ab".

Steven D'Aprano

October 2012

12:07 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 11/10/12 07:04, Serhiy Storchaka wrote:

...

Mike Graham

8:08 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Regular expressions are difficult if you're remembering which escape sequences exist and are easy if you're using raw string literals. Mike

Antoine Pitrou

8:16 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, 10 Oct 2012 16:08:22 -0400 Mike Graham <mikegraham@gmail.com> wrote:

...

Mike Graham

8:45 p.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, Oct 10, 2012 at 4:16 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

I'm not sure what misconception you're saying I have. An example of when you have to remember what the escapes are is

...

Mike

Steven D'Aprano

12:08 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 11/10/12 07:08, Mike Graham wrote:

...

On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou<solipsis@pitrou.net> wrote:

...
On Wed, 10 Oct 2012 15:36:08 -0400 Mike Graham<mikegraham@gmail.com> wrote:

...

...
...
The literal"\c" should be an error

...

...
...
but in practice means "\\c". It's probably too late to make this invalid syntax as it out to be, but I wonder if a warning isn't in order, especially with the theoretical potential of adding new string escapes in the future.

-1. This will make life more difficult with regular expressions (and produce lots of spurious warnings in existing code).

I agree with Antoine here. If and when there is a serious, concrete proposal to add a new string escape, and not just a "theoretical potential", then we should consider adding warnings.

...

Regular expressions are difficult if you're remembering which escape sequences exist and are easy if you're using raw string literals.

Just because some people find it hard to remember doesn't mean that it should be an error *not* to use raw strings. -- Steven

Mike Graham

2:24 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On Wed, Oct 10, 2012 at 8:08 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Steven D'Aprano

October 2012

2:58 a.m.

New subject: Make undefined escape sequences have SyntaxWarnings

On 11/10/12 13:24, Mike Graham wrote:

...

On Wed, Oct 10, 2012 at 8:08 PM, Steven D'Aprano<steve@pearwood.info> wrote:

...
On 11/10/12 07:08, Mike Graham wrote:

...
On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou<solipsis@pitrou.net> wrote:

...
On Wed, 10 Oct 2012 15:36:08 -0400 Mike Graham<mikegraham@gmail.com> wrote:

...
...
...
The literal"\c" should be an error

Who says so? My bash shell disagrees with you:

Frankly, I don't look to bash for sensible language design advice.

...

I think concepts like "In the face of ambiguity, refuse the temptation to guess" guides how we should see the decision here.

...

"Backslash is for escape sequences except when it's not" seemed like an obviously-misfortunate thing to me.

...

I think four string escapes have been added since versions of Python I was aware of. Writing code like "ab\c" seems seedy in light of that

Adding a new escape sequence is almost as big a step as adding a new built-in or new syntax. I see that as a good thing, it discourages too many requests for new escape sequences. -- Steven