
On 05/17/2013 04:41 PM, rurpy@yahoo.com wrote:
On Friday, May 17, 2013 8:14:39 AM UTC-6, Ron Adam wrote:
On 05/17/2013 06:41 AM, Steven D'Aprano wrote: > They clearly should be in different threads. Line continuation is > orthogonal to string continuation. You can have string concatenation on a > single line: > > s = "Label:\t" r"Data containing \ backslashes"
Can you think of, or find an example of two adjacent strings on the same line that can't be written as a single string?
s = "Label:\t Data containing \ backslashes"
I'm curious about how much of a problem not having implicit string concatenations really is?
"Can't" is an unrealistically high a bar but I posted a real example at http://mail.python.org/pipermail/python-ideas/2013-May/020847.html that is *better* written IMO as adjacently-concatenated string literals.
If we didn't have implicit string concatenation, I'd probably write it with each part on a separate line to make it easier to read. pattern = '[^\uFF1B\u30FB\u3001' \ + r'+:=.,\/\[\]\t\r\n]+' \ + '[\#\uFF03]+' I think in this case the strings are joined at compile time as Guido suggested in is post. You could also write it as... pattern = ('[^\uFF1B\u30FB\u3001' + r'+:=.,\/\[\]\t\r\n]+' + '[\#\uFF03]+') If implicit string concatenation is removed, it would be nice if there was an explicit replacement for it. There is a strong consensus for doing it, but there isn't strong consensus on how to do it. About line continuations: Line continuations are a related issue to string concatenations because they are used together fairly often. The line continuation behaviour is a bit quarky, but not in any critical way. There has even been a PEP to remove it in python 3, but it was rejected for not having enough support. People do use it, so it would be better if it was improved rather than removed. As noted in other messages, the line continuation is copied from C, which I think originally came from the 'Make' utility. (I'm not positive on that) In C and Make, the \+newline pair is replaced with a space. Python just removes both the \+newline and keeps track of weather or not it's in a string. Look in tokenize.c for this. As for the *not too important* quarkyness:
'abc' \ 'efg' File "<stdin>", line 1 'abc' \ 'efg' ^ SyntaxError: unexpected character after line continuation character
This error implies that the '\' by it self is a line continuation token even though it's not followed by a newline. Other wise you would get the same SyntaxError you get when you use any other symbol in an invalid way. This was probably done either because it was easy to do, and/or because a better error message is more helpful. Trailing white space results in the same error. This happens enough to be annoying. It is confusing to some people why the compiler can recognise the line continuation *character*, but can't figure out that the white space after it is not important.
# comment 1\ ... comment 2 File "<stdin>", line 2 comment 2 ^ SyntaxError: invalid syntax
This just shows that comments are parsed before line continuations are considered. Or to put it another way.. the '\' is part of the comment. That isn't the case in C or Make. You can continue a comment on the next line with a line continuation. Nothing wrong with this, but it shows the line continuations in Python aren't exact copies of the line continuation in C. There are perfectly good reasons why the compiler does what it does in each of these cases. I think the little things like this together has contributed to the feeling that line continuations are bad and should be avoided. The discussed (and implied) options: There are a number of options that have been discussed but those haven't really been clearly spelled out so the discussion has been kind of out of focus. This seems like an overly detailed list, but the discussion has touched on pretty much all of these things. I think the goal should be to find the most cohesive combination for Python 4 and/or just go with B alone. A. Do nothing. B. Remove implicit concatenation. (We could stop here, anything after this can be done later.) C. Remove Explicit line continuations. (See options below.) D. Add a new explicit string concatenation token. E. Reuse the \ as an explicit string concatenation. (with C) F. Make an exception for implicit string concatenations only after a line continuation. (with B) G. Make an exception for line continuations if a line ends with a explicit string concatenation. (With C and (D or E)) H. Change line concatenation character from \+newline to just \. I. Allow implicit line continuations if a line ends with a operator that expects to be continued, like a comma inside parentheses already does. (With C) Option H has some interesting possibilities. It pretty much is a complete replacement for the current escaped newline continuation, so how it works, and what constraints it has, would need to be discussed. It's the option that would allow white space and comments after a line continuation character. Option I is interesting because it's already there inside of parentheses, and other containers. It's just haven't seen it described as an implicit line continuation before. It is my feeling that we can't change the escaped newline within strings. That need to be how it is, and it should be documented as a string feature, rather than a general line continuation token. So if line continuations outside of strings is removed, escaped newlines inside of strings will still work. There are so many possibilities here, that the only thing I'm sure of right now is to go ahead and start the process of removing implicit string concatenations (Option B), and then consider everything else as separate issues in that context. Cheers, Ron