Mailman 3 Implicit string literal concatenation considered harmful? - Python-ideas

newer
Descouraging the implicit string...

Implicit string literal concatenation considered harmful?

Guido van Rossum

10 May 2013 10 May '13

1:48 p.m.

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b'). This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden). Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.) Would it be reasonable to start deprecating this and eventually remove it from the language? -- --Guido van Rossum (python.org/~guido)

Show replies by date

Matt Chaput

10 May 10 May

1:50 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 5/10/2013 2:48 PM, Guido van Rossum wrote:

...

Would it be reasonable to start deprecating this and eventually remove it from the language?

Yes please! I've been bitten by the same issue more than once. Matt

Mark Janssen

10:43 p.m.

On Fri, May 10, 2013 at 11:50 AM, Matt Chaput <matt@whoosh.ca> wrote:

...

On 5/10/2013 2:48 PM, Guido van Rossum wrote:

...
Would it be reasonable to start deprecating this and eventually remove it from the language?

Yes please! I've been bitten by the same issue more than once.

+1 -m

Dave Peticolas

1:58 p.m.

2013/5/10 Guido van Rossum <guido@python.org>

...

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

...

From my perspective as a Python user (not knowing anything about the ramifications for the required changes to the parser, etc.) it is very reasonable. This bug is very hard to spot when it happens, and an argument count error is really one of the more benign forms it can take.

...

-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- --Dave Peticolas

Ethan Furman

2:07 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 05/10/2013 11:48 AM, Guido van Rossum wrote:

...

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

Sounds good to me. -- ~Ethan~

Antoine Pitrou

2:16 p.m.

On Fri, 10 May 2013 11:48:51 -0700 Guido van Rossum <guido@python.org> wrote:

...

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code. However, in your case, foo('a' 'b') could raise a SyntaxWarning, since the "continuation" is on the same line. Regards Antoine.

Ezio Melotti

2:18 p.m.

On Fri, May 10, 2013 at 10:16 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On Fri, 10 May 2013 11:48:51 -0700 Guido van Rossum <guido@python.org> wrote:

...
Would it be reasonable to start deprecating this and eventually remove it from the language?

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code.

However, in your case, foo('a' 'b') could raise a SyntaxWarning, since the "continuation" is on the same line.

I was going to say the exact same thing -- you just read my mind :)

...

Regards

Antoine.

M.-A. Lemburg

2:28 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 10.05.2013 21:16, Antoine Pitrou wrote:

...

On Fri, 10 May 2013 11:48:51 -0700 Guido van Rossum <guido@python.org> wrote:

...
I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code.

However, in your case, foo('a' 'b') could raise a SyntaxWarning, since the "continuation" is on the same line.

Nice idea. I mostly use this feature when writing multi-line or too-long-to-fit-on- one-editor-line string literals: s = ('abc\n' 'def\n' 'ghi\n') t = ('some long paragraph spanning multiple lines in an editor, ' 'without newlines') This looks and works much better than triple-quoted string literals, esp. when defining such string literals in indented code. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 10 2013)

...

...
...
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

2013-05-07: Released mxODBC Zope DA 2.1.2 ... http://egenix.com/go46 2013-05-06: Released mxODBC 3.2.3 ... http://egenix.com/go45 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Barry Warsaw

2:41 p.m.

On May 10, 2013, at 09:28 PM, M.-A. Lemburg wrote:

...

...
...
Would it be reasonable to start deprecating this and eventually remove it from the language?

I'm pretty mixed. OT1H, you're right, it's a common mistake and often *very* hard to spot. A SyntaxWarning when it appears on a single line doesn't help because I'm much more likely to forget a trailing comma in situations like: files = [ '/tmp/foo', '/etc/passwd' '/etc/group', '/var/cache', ] (g'wan, spot the missing comma ;). OTOH, doing things like:

...

s = ('abc\n' 'def\n' 'ghi\n') t = ('some long paragraph spanning multiple lines in an editor, ' 'without newlines')

Is pretty common in code I see all the time. I'm not sure why; I use it occasionally, but only very rarely. A lot of folks like this style a lot though from what I can tell.

...

This looks and works much better than triple-quoted string literals, esp. when defining such string literals in indented code.

I also see this code a lot: from textwrap import dedent s = dedent("""\ abc def ghi """) I think having to deal with indentation could be a common reason why people use implicit concatenation instead of TQS. All things considered, I think the difficult-to-spot bugginess of implicit concatenation outweighs the occasional convenience of it. -Barry

Guido van Rossum

2:30 p.m.

On Fri, May 10, 2013 at 12:16 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On Fri, 10 May 2013 11:48:51 -0700 Guido van Rossum <guido@python.org> wrote:

...
I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code.

Fixing this is an easy task for lib2to3 though. I think the "convenience" argument doesn't cut it -- if Python didn't have it, can you imagine it being added? It would never make it past all the examples of code broken by missing commas.

...

However, in your case, foo('a' 'b') could raise a SyntaxWarning, since the "continuation" is on the same line.

There are plenty of examples where the continuation isn't on the same line (some were already posted here). -- --Guido van Rossum (python.org/~guido)

Antoine Pitrou

2:37 p.m.

On Fri, 10 May 2013 12:30:15 -0700 Guido van Rossum <guido@python.org> wrote:

...

On Fri, May 10, 2013 at 12:16 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
On Fri, 10 May 2013 11:48:51 -0700 Guido van Rossum <guido@python.org> wrote:

...
I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code.

Fixing this is an easy task for lib2to3 though.

Assuming someone does it :-) You may also have to "fix" other software. For example, I don't know if gettext supports fetching literals from triple-quoted Python strings, while it works with string continuations. As for "+", saying it is a replacement is a bit simplified, because the syntax definition (for method calls) or operator precedence (for e.g. %-formatting) may force you to add parentheses. Regards Antoine.

Greg Ewing

7:43 p.m.

New subject: Implicit string literal concatenation considered harmful?

Antoine Pitrou wrote:

...

As for "+", saying it is a replacement is a bit simplified, because the syntax definition (for method calls) or operator precedence (for e.g. %-formatting) may force you to add parentheses.

Maybe we could turn ... into a "string continuation operator": print("This is example %d of a line that is "... "too long" % example_number) -- Greg

Mark Janssen

10:50 p.m.

...

Maybe we could turn ... into a "string continuation operator":

print("This is example %d of a line that is "... "too long" % example_number)

I think that is an awesome idea. -- MarkJ Tacoma, Washington

Andrew Barnert

11 May 11 May

12:05 a.m.

On May 10, 2013, at 20:50, Mark Janssen <dreamingforward@gmail.com> wrote:

...

...
Maybe we could turn ... into a "string continuation operator":

print("This is example %d of a line that is "... "too long" % example_number)

I think that is an awesome idea.

How is this any better than + in the same position? It's harder to notice, and longer (remember that the only reason you're doing this is that you can't fit your strings into 80 cols). Also, this gives two ways to do it, that have the exact same effect when they're both legal. The only difference is that the new way is only legal in a restricted set of cases. By the way, is it just a coincidence that almost all of the people sticking up for keeping or replacing implicit concatenation instead of just scrapping it are using % formatting in their examples?

Random832

12:11 a.m.

New subject: Implicit string literal concatenation considered harmful?

...

How is this any better than + in the same position? It's harder to notice, and longer (remember that the only reason you're doing this is that you can't fit your strings into 80 cols).

By the way, is it just a coincidence that almost all of the people sticking up for keeping or replacing implicit concatenation instead of just scrapping it are using % formatting in their examples? You just answered your own question. The reason it's better than + in

On 05/11/2013 01:05 AM, Andrew Barnert wrote: the same position, for those people, is that it would have higher precedence than %.

Andrew Barnert

12:15 a.m.

On May 10, 2013, at 22:11, Random832 <random832@fastmail.us> wrote:

...

On 05/11/2013 01:05 AM, Andrew Barnert wrote:

...
How is this any better than + in the same position? It's harder to notice, and longer (remember that the only reason you're doing this is that you can't fit your strings into 80 cols).

By the way, is it just a coincidence that almost all of the people sticking up for keeping or replacing implicit concatenation instead of just scrapping it are using % formatting in their examples? You just answered your own question. The reason it's better than + in the same position, for those people, is that it would have higher precedence than %.

Ah, that makes sense. Except that % formatting is supposed to be one of those "we haven't deprecated it, but we will, so stop using it" features, so it seems a little odd to add new syntax to make % formatting easier. Also, doesn't this imply that ... is now an operator in some contexts, but a literal in others?

Random832

12:30 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 05/11/2013 01:15 AM, Andrew Barnert wrote:

...

Ah, that makes sense.

Except that % formatting is supposed to be one of those "we haven't deprecated it, but we will, so stop using it" features, so it seems a little odd to add new syntax to make % formatting easier.

Well, technically the same would apply to .format(), I guess.

Greg Ewing

12:39 a.m.

New subject: Implicit string literal concatenation considered harmful?

Andrew Barnert wrote:

...

Except that % formatting is supposed to be one of those "we haven't deprecated it, but we will, so stop using it" features, so it seems a little odd to add new syntax to make % formatting easier.

Except that the same problem also occurs with .format() formatting.

...

...
...
"a{}b" "c{}d" .format(1,2) 'a1bc2d'

but

...

...
...
"a{}b" + "c{}d" .format(1,2) 'a{}bc1d'

so you need

...

...
...
("a{}b" + "c{}d") .format(1,2) 'a1bc2d'

...

Also, doesn't this imply that ... is now an operator in some contexts, but a literal in others?

It would have different meanings in different contexts, yes. But I wouldn't think of it as an operator, more as a token indicating string continuation, in the same way that the backslash indicates line continuation. -- Greg

Ron Adam

5:19 p.m.

Greg, I meant to send my reply earlier to the list. On 05/11/2013 12:39 AM, Greg Ewing wrote:

...

...
Also, doesn't this imply that ... is now an operator in some contexts, but a literal in others?

Could it's use as a literal be depreciated? I haven't seen it used in that except in examples.

...

It would have different meanings in different contexts, yes.

But I wouldn't think of it as an operator, more as a token indicating string continuation, in the same way that the backslash indicates line continuation.

Yep, it would be a token that the tokenizer would handle. So it would be handled before anything else just as the line continuation '\' is. After the file is tokenized, it is removed and won't interfere with anything else. It could be limited to strings, or expanded to include numbers and possibly other literals. a = "a long text line "... "that is continued "... "on several lines." pi = 3.1415926535... 8979323846... 2643383279 You can't do this with a line continuation '\'. Another option would be to have dedented multi-line string tokens |""" and |'''. Not too different than r""" or b""". s = |"""Multi line string | |paragraph 1 | |paragraph 2 |""" a = |"""\ |a long text line \ |that is continued \ |on several lines.\ |""" The rule for this is, for strings that start with |""" or |''', each following line needs to be proceeded with whitespace + '|', until the closing quote is reached. The tokenizer would just find and remove them as it comes across them. Any '|' on a line after the first '|' would be unaffected, so they don't need to be escaped. IT's a very explicit syntax. It's very obvious what is part of the string and what isn't. Something like this would end the endless debate on dedents. That alone might be worth it. ;-) I know the | is also a binary 'or' operator, but it's use for that is in a different contex, so I don't think it would be a problem. Both of these options would be implemented in the tokenizer and are really just tools to formatting source code rather than actual additions or changes to the language. Cheers, Ron

Nick Coghlan

12 May 12 May

12:10 a.m.

On Sun, May 12, 2013 at 8:19 AM, Ron Adam <ron3200@gmail.com> wrote:

...

Greg, I meant to send my reply earlier to the list.

On 05/11/2013 12:39 AM, Greg Ewing wrote:

...
...
Also, doesn't this imply that ... is now an operator in some contexts,

...
but a literal in others?

Could it's use as a literal be depreciated? I haven't seen it used in that except in examples.

I take it you don't use Python for multi-dimensional array based programming, then. The ellipsis literal was added at the request of the numeric programming folks, so they had a notation for "all remaining columns" in an index tuple, and it is still used for that today. The only change related to this in Python 3 was to lift the syntactic restriction that limited the literal form to container subscripts. This change eliminated Python 2's discrepancy between defining index tuples directly in the subscript and in saving them to a variable first, or passing them as arguments to a function. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Gregory P. Smith

14 May 14 May

11:36 a.m.

On Sat, May 11, 2013 at 3:19 PM, Ron Adam <ron3200@gmail.com> wrote:

...

Greg, I meant to send my reply earlier to the list.

On 05/11/2013 12:39 AM, Greg Ewing wrote:

...
Also, doesn't this imply that ... is now an operator in some contexts,

...
but a literal in others?

Could it's use as a literal be depreciated? I haven't seen it used in that except in examples.

It would have different meanings in different contexts, yes.

...
But I wouldn't think of it as an operator, more as a token indicating string continuation, in the same way that the backslash indicates line continuation.

Yep, it would be a token that the tokenizer would handle. So it would be handled before anything else just as the line continuation '\' is. After the file is tokenized, it is removed and won't interfere with anything else.

It could be limited to strings, or expanded to include numbers and possibly other literals.

a = "a long text line "... "that is continued "... "on several lines."

pi = 3.1415926535... 8979323846... 2643383279

You can't do this with a line continuation '\'.

Another option would be to have dedented multi-line string tokens |""" and |'''. Not too different than r""" or b""".

s = |"""Multi line string | |paragraph 1 | |paragraph 2 |"""

a = |"""\ |a long text line \ |that is continued \ |on several lines.\ |"""

The rule for this is, for strings that start with |""" or |''', each following line needs to be proceeded with whitespace + '|', until the closing quote is reached. The tokenizer would just find and remove them as it comes across them. Any '|' on a line after the first '|' would be unaffected, so they don't need to be escaped.

+1 to adding something like that. i loathe code that uses textwrap.dedent on constants. poor memory and runtime overhead. I was just writing up a response to suggest adding auto-detended multi-line strings to take care of one of the major use cases. I went with a naive d""" approach but I also like your | idea here. though it might cause too many people to want to line up the opening | and the following |s (which isn't necessary at all and is actively harmful for code style if it forces tedious reindentation when refactoring code that alters the length of the lhs before the opening |""") -gps

...

IT's a very explicit syntax. It's very obvious what is part of the string and what isn't. Something like this would end the endless debate on dedents. That alone might be worth it. ;-)

I know the | is also a binary 'or' operator, but it's use for that is in a different contex, so I don't think it would be a problem.

Both of these options would be implemented in the tokenizer and are really just tools to formatting source code rather than actual additions or changes to the language.

Cheers, Ron

______________________________**_________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>

Mark Lawrence

11 May 11 May

6:10 a.m.

On 11/05/2013 06:15, Andrew Barnert wrote:

...

Ah, that makes sense.

Except that % formatting is supposed to be one of those "we haven't deprecated it, but we will, so stop using it" features, so it seems a little odd to add new syntax to make % formatting easier.

I don't think so, see http://mail.python.org/pipermail/python-dev/2012-February/116790.html -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence

Mark Janssen

1:52 p.m.

...

...
...
Maybe we could turn ... into a "string continuation operator":

print("This is example %d of a line that is "... "too long" % example_number)

I think that is an awesome idea.

How is this any better than + in the same position? It's harder to notice, and longer (remember that the only reason you're doing this is that you can't fit your strings into 80 cols).

It partitions the conceptual space. "+" is a mathematical operator, but strings are not numbers. That's the negative argument for it. The positive, further, argument is that the elipsis has a long history of being a continuation indicator in text.

...

By the way, is it just a coincidence that almost all of the people sticking up for keeping or replacing implicit concatenation instead of just scrapping it are using % formatting in their examples?

An interesting correlation indeed. -- MarkJ Tacoma, Washington

Ian Cordasco

1:57 p.m.

On Sat, May 11, 2013 at 2:52 PM, Mark Janssen <dreamingforward@gmail.com> wrote:

...

...
...
...
Maybe we could turn ... into a "string continuation operator":

print("This is example %d of a line that is "... "too long" % example_number)

I think that is an awesome idea.

How is this any better than + in the same position? It's harder to notice, and longer (remember that the only reason you're doing this is that you can't fit your strings into 80 cols).

It partitions the conceptual space. "+" is a mathematical operator, but strings are not numbers. That's the negative argument for it. The positive, further, argument is that the elipsis has a long history of being a continuation indicator in text.

But + is already a supported operation on strings and has been since at least python 2. It is already there and it doesn't require a new dunder method for concatenating with the Ellipsis object. It's also relatively fast and already performed at compile time. If we're going to remove this implicit concatenation, why do we have to add a fancy new feature that's non-obvious and going to need extra implementation?

...

...
By the way, is it just a coincidence that almost all of the people sticking up for keeping or replacing implicit concatenation instead of just scrapping it are using % formatting in their examples?

An interesting correlation indeed.

Albeit one that is probably unrelated. I use str.format everywhere (mostly because I don't support python 2.5 in most of my work) and I'm against it. I just haven't given examples against it because others have already presented examples that I would have provided.

Greg Ewing

6:55 p.m.

New subject: Implicit string literal concatenation considered harmful?

Ian Cordasco wrote:

...

On Sat, May 11, 2013 at 2:52 PM, Mark Janssen <dreamingforward@gmail.com> wrote:

...

...
It partitions the conceptual space. "+" is a mathematical operator, but strings are not numbers.

But + is already a supported operation on strings

I still think about these two kinds of concatenation in different ways, though. When I use implicit concatenation, I don't think in terms of taking two strings and joining them together. I'm just writing a single string literal that happens to span two source lines. I believe that distinguishing them visually helps readability. Using + for both makes things look more complicated than they really are. -- Greg

Antoine Pitrou

7:33 p.m.

On Sun, 12 May 2013 11:55:34 +1200 Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Ian Cordasco wrote:

...
On Sat, May 11, 2013 at 2:52 PM, Mark Janssen <dreamingforward@gmail.com> wrote:

...
...
It partitions the conceptual space. "+" is a mathematical operator, but strings are not numbers.

But + is already a supported operation on strings

I still think about these two kinds of concatenation in different ways, though. When I use implicit concatenation, I don't think in terms of taking two strings and joining them together. I'm just writing a single string literal that happens to span two source lines.

I believe that distinguishing them visually helps readability. Using + for both makes things look more complicated than they really are.

Agreed. Regards Antoine.

Stephen J. Turnbull

10:10 p.m.

New subject: Implicit string literal concatenation considered harmful?

Antoine Pitrou writes:

...

Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

...
I believe that distinguishing them visually helps readability. Using + for both makes things look more complicated than they really are.

...

Agreed.

In principle, I'm with Guido on this one. TOOWTDI and EIBTI weigh heavily with me, and I have been bitten by the "sequence of strings ends with no comma" bug more than once (though never twice in one day ;-). Nor do I really care whether concatenation is a runtime or compile-time operation. But vox populi is deafening.... BTW, I see no reason not to optimize "'a' + 'b'", as you can always force runtime evaluation with "''.join(['a','b'])" (which looks insane here, but probably wouldn't in a case where forcing runtime evaluation was useful).

Christian Tismer

15 May 15 May

7:18 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 12.05.13 01:55, Greg Ewing wrote:

...

Ian Cordasco wrote:

...
On Sat, May 11, 2013 at 2:52 PM, Mark Janssen <dreamingforward@gmail.com> wrote:

...
...
It partitions the conceptual space. "+" is a mathematical operator, but strings are not numbers.

But + is already a supported operation on strings

I still think about these two kinds of concatenation in different ways, though. When I use implicit concatenation, I don't think in terms of taking two strings and joining them together. I'm just writing a single string literal that happens to span two source lines.

I believe that distinguishing them visually helps readability. Using + for both makes things look more complicated than they really are.

Thinking more about this, yes I see that "+" is really different for various reasons, when you just want to write a long string. "+" involves precedence rules, which is actually too much. Writing continuation lines with '\' is much less convenient, because you cannot insert comments. What I still don't like is the pure absence of anything that makes the concatenation more visible. So I'm searching for different ways to denote concatenating of subsequent strings. Or to put it the other way round: We also can see it as ways to denote the _interruption_ of a string. Thinking out loud... A string is built, then we break its construction into pieces that are glued together by the parser. Hmm, this sounds again more like triple-quoted strings. Still searching... -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Greg Ewing

11 May 11 May

6:59 p.m.

New subject: Implicit string literal concatenation considered harmful?

Ian Cordasco wrote:

...

But + is already a supported operation on strings and has been since at least python 2. It is already there and it doesn't require a new dunder method for concatenating with the Ellipsis object.

There would be no dunder method, because it's not a run-time operation. It's a syntax for writing a string literal that spans more than one line. Using it between any two things that are not string literals would be a syntax error. -- Greg

Greg Ewing

6:46 p.m.

New subject: Implicit string literal concatenation considered harmful?

Someone wrote:

...

By the way, is it just a coincidence that almost all of the people sticking up for keeping or replacing implicit concatenation instead of just scrapping it are using % formatting in their examples?

In my case this is because it's the context in which I use this feature most often. -- Greg

Stephen J. Turnbull

12:16 a.m.

New subject: Implicit string literal concatenation considered harmful?

Mark Janssen writes:

...

...
Maybe we could turn ... into a "string continuation operator":

print("This is example %d of a line that is "... "too long" % example_number)

I think that is an awesome idea.

Violates TOOWTDI. >>> print("This is an" + # traditional explicit operator ... " %s idea." % ("awesome" if False else "unimpressive")) This is an unimpressive idea. >>> already works (and always has AFAIK -- modulo the ternary operator, of course).

Mark Janssen

2:22 p.m.

...

...
I think that is an awesome idea.

Violates TOOWTDI.

>>> print("This is an" + # traditional explicit operator ... " %s idea." % ("awesome" if False else "unimpressive")) This is an unimpressive idea. >>>

But you see you just helped me demonstrate my point: the Python interpreter *itself* uses ... as a line-continuation operater! Also, it won't violate TOOWTDI if the "+" operator is deprecated for strings. Strings are different from numbers anyway, it's an old habit/wart to use "+" for them. *moving out of the way* :)) -- MarkJ Tacoma, Washington

Ian Cordasco

2:27 p.m.

On Sat, May 11, 2013 at 3:22 PM, Mark Janssen <dreamingforward@gmail.com> wrote:

...

...
...
I think that is an awesome idea.

Violates TOOWTDI.

>>> print("This is an" + # traditional explicit operator ... " %s idea." % ("awesome" if False else "unimpressive")) This is an unimpressive idea. >>>

But you see you just helped me demonstrate my point: the Python interpreter *itself* uses ... as a line-continuation operater!

It also uses it when you define a class or function, should those declarations use Ellipsis everywhere too? (For reference:

...

...
...
class A: ... a = 1 ... def __init__(self, **kwargs): ... for k, v in kwargs.items(): ... if k != 'a': ... setattr(self, k, v) ... i = A()

But this is getting off-topic and the question is purely rhetorical.) -- Ian

Greg Ewing

7:11 p.m.

New subject: Implicit string literal concatenation considered harmful?

Mark Janssen wrote:

...

Strings are different from numbers anyway, it's an old habit/wart to use "+" for them.

*moving out of the way* :))

/me throws a dictionary at Mark Janssen with a bookmark at the entry for "plus", showing that its usage in English is much wider than it is in mathematics. -- Greg

Stephen J. Turnbull

7:30 p.m.

New subject: Implicit string literal concatenation considered harmful?

Mark Janssen writes:

...

...
...
I think that is an awesome idea.

Violates TOOWTDI.

>>> print("This is an" + # traditional explicit operator ... " %s idea." % ("awesome" if False else "unimpressive")) This is an unimpressive idea. >>>

But you see you just helped me demonstrate my point: the Python interpreter *itself* uses ... as a line-continuation operater!

No, it doesn't. It's a (physical) line *separator* there. This:

...

...
...
"This is a syntax" + File "<stdin>", line 1 "this is a syntax " + ^ SyntaxError: invalid syntax

is a syntax error. If "... " were a line continuation, it would be a prompt for the rest of the line, but you never get there.

...

Also, it won't violate TOOWTDI if the "+" operator is deprecated for strings. Strings are different from numbers anyway, it's an old habit/wart to use "+" for them.

They're both just mathematical objects that have operations defined on them. Although in math we usually express multiplication by juxtaposition, I personally think EIBTI applies here. Ie, IMO using "+" makes a lot of sense although the precedence argument is a good one (but not good enough for introducing another operator, especially using a symbol that already has a different syntactic meaning). I think it's pretty clear that deprecating compile-time concatenation by juxtaposition would be massively unpopular, so the deprecation should be delayed until there's a truly attractive alternative. I think the various proposals for a dedenting syntax come close, but there remains too much resistance for my taste, and I suspect Guido won't push it. I also agree with those who think that it probably should wait for Python 4, given that it was apparently considered and rejected for Python 3.

Stefan Drees

6:43 a.m.

New subject: Implicit string literal concatenation considered harmful?

Am 11.05.13 02:43, schrieb Greg Ewing:

...

Antoine Pitrou wrote:

...
As for "+", saying it is a replacement is a bit simplified, because the syntax definition (for method calls) or operator precedence (for e.g. %-formatting) may force you to add parentheses.

Maybe we could turn ... into a "string continuation operator":

print("This is example %d of a line that is "... "too long" % example_number)

at least trying to follow the complete thread so only a late feedback on this proposal from me: The mysterious type [Ellipsis] comes to the rescue with all of its three characters - helping to stay below 80 chars ? In this message I avoid further adding or subtracting numbers to not overflow the result ;-) but I somhow like the current two possible ways of doing "it", as when - manually - migrating code eg. from php to python I may either remove dots or replace these with plus signs. So I have a fast working migrated code base and then - while the clients work with it - I have a more relaxed schedule to further clean it up. [Ellipsis]: http://docs.python.org/3.3/reference/datamodel.html#index-8 All the best, Stefan.

Serhiy Storchaka

9:28 a.m.

11.05.13 03:43, Greg Ewing написав(ла):

...

Maybe we could turn ... into a "string continuation operator":

print("This is example %d of a line that is "... "too long" % example_number)

Maybe "/"? ;)

M.-A. Lemburg

10 May 10 May

2:46 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 10.05.2013 21:30, Guido van Rossum wrote:

...

On Fri, May 10, 2013 at 12:16 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
On Fri, 10 May 2013 11:48:51 -0700 Guido van Rossum <guido@python.org> wrote:

...
I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code.

Fixing this is an easy task for lib2to3 though.

Think about code written to work in Python 2 and 3. Python 2 would have to get the compile-time concatenation as well, to prevent slow-downs due to run-time concatenation. And there would have to be a tool to add the '+' signs and parens to the Python 2 code... s = ('my name is %s and ' 'I live on %s street' % ('foo', 'bar')) --> s = ('my name is %s and ' + 'I live on %s street' % ('foo', 'bar')) results in: Traceback (most recent call last): File "<stdin>", line 2, in <module> TypeError: not all arguments converted during string formatting The second line is also a good example of how removing the feature would introduce a new difficult to see error :-) IMO, the issue is a task for an editor or a lint tool to highlight, not the Python compiler, IMO. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 10 2013)

...

...
...
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

Juancarlo Añez

5:07 p.m.

On Fri, May 10, 2013 at 3:00 PM, Guido van Rossum <guido@python.org> wrote:

...

There are plenty of examples where the continuation isn't on the same line (some were already posted here).

+1 I've never used the feature and don't intent to. A related annoyance is the trailing comma at the end of stuff (probably a leftover from a previous edit). For example: def fun(a, b, c,): Parses fine. But the one that has bitten me is the comma at the end of a line:

...

...
...
x = 1, x

(1,)

...

...
...
x == 1, # inconsistency?

(False,)

...

...
...
x == (1,)

True

...

...
...
y = a_very_long_call(param1, param2, param3), # this trailing comma is difficult to spot

I'd prefer that the syntax for creating one-tuples requires the parenthesis, and that trailing commas are disallowed. Cheers, -- Juancarlo *Añez*

Michael Foord

3:09 p.m.

On 10 May 2013 20:16, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On Fri, 10 May 2013 11:48:51 -0700 Guido van Rossum <guido@python.org> wrote:

...
I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code.

However, in your case, foo('a' 'b') could raise a SyntaxWarning, since the "continuation" is on the same line.

I'm with Antoine. I love using implicit concatenation for splitting long literals across multiple lines. Michael

...

Regards

Antoine.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

Philip Jenvey

6:43 p.m.

On May 10, 2013, at 1:09 PM, Michael Foord wrote:

...

On 10 May 2013 20:16, Antoine Pitrou <solipsis@pitrou.net> wrote:

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code.

However, in your case, foo('a' 'b') could raise a SyntaxWarning, since the "continuation" is on the same line.

I'm with Antoine. I love using implicit concatenation for splitting long literals across multiple lines.

Strongly -1 on this proposal, I also use this quite often. -- Philip Jenvey

Georg Brandl

11 May 11 May

12:24 a.m.

Am 11.05.2013 01:43, schrieb Philip Jenvey:

...

On May 10, 2013, at 1:09 PM, Michael Foord wrote:

...
On 10 May 2013 20:16, Antoine Pitrou <solipsis@pitrou.net> wrote:

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code.

However, in your case, foo('a' 'b') could raise a SyntaxWarning, since the "continuation" is on the same line.

I'm with Antoine. I love using implicit concatenation for splitting long literals across multiple lines.

Strongly -1 on this proposal, I also use this quite often.

-1 here. I use it a lot too, and find it very convenient, and while I could live with the change, I think it should have been made together with the lot of other syntax changes going to Python 3. Georg

Stefan Behnel

4:37 a.m.

Georg Brandl, 11.05.2013 07:24:

...

Am 11.05.2013 01:43, schrieb Philip Jenvey:

...
On May 10, 2013, at 1:09 PM, Michael Foord wrote:

...
On 10 May 2013 20:16, Antoine Pitrou wrote:

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code.

However, in your case, foo('a' 'b') could raise a SyntaxWarning, since the "continuation" is on the same line.

I'm with Antoine. I love using implicit concatenation for splitting long literals across multiple lines.

Strongly -1 on this proposal, I also use this quite often.

-1 here. I use it a lot too, and find it very convenient, and while I could live with the change, I think it should have been made together with the lot of other syntax changes going to Python 3.

I used to sort-of dislike it in the past and only recently started using it more often, specifically for dealing with long string literals. I really like it for that, although I've also been bitten by the "missing comma" bug. I guess I'm -0.5 on removing it. Stefan

Christian Tismer

11:37 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 11.05.13 11:37, Stefan Behnel wrote:

...

...
...
On May 10, 2013, at 1:09 PM, Michael Foord wrote:

...
On 10 May 2013 20:16, Antoine Pitrou wrote:

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code.

However, in your case, foo('a' 'b') could raise a SyntaxWarning, since the "continuation" is on the same line.

I'm with Antoine. I love using implicit concatenation for splitting long literals across multiple lines. Strongly -1 on this proposal, I also use this quite often. -1 here. I use it a lot too, and find it very convenient, and while I could

Am 11.05.2013 01:43, schrieb Philip Jenvey: live with the change, I think it should have been made together with the lot of other syntax changes going to Python 3. I used to sort-of dislike it in the past and only recently started using it more often, specifically for dealing with long string literals. I really

Georg Brandl, 11.05.2013 07:24: like it for that, although I've also been bitten by the "missing comma" bug.

I guess I'm -0.5 on removing it.

I'm +1 on removing it, if it is combined with better indentation options for triple-quoted strings. So if there was some notation (not specified yet how) that triggers correct indentation at compile time without extra functional hacks, so that long_text = """ this text is left justified and this line indents by two spaces """ is stripped the leading and trailing \n and indentation is justified, then I think the need for the implicit whitespace operator would be small. cheers -- chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Nick Coghlan

11:48 a.m.

On Sun, May 12, 2013 at 2:37 AM, Christian Tismer <tismer@stackless.com> wrote:

...

So if there was some notation (not specified yet how) that triggers correct indentation at compile time without extra functional hacks, so that

long_text = """ this text is left justified and this line indents by two spaces """

is stripped the leading and trailing \n and indentation is justified, then I think the need for the implicit whitespace operator would be small.

Through participating in this thread, I've realised that the distinction between when I use a triple quoted string (with or without textwrap.dedent()) and when I use implicit string concatenation is whether or not I want the newlines in the result. Often I can avoid the issue entirely by splitting a statement into multiple pieces, but I think Guido's right that if we didn't have implicit string concatenation there's no way we would add it ("just use a triple quoted string with escaped newlines" or "just use runtime string concatenation"), but given that we *do* have it, I don't think it's worth the hassle of removing it over a bug that a lint program should be able to pick up. So I'm back to where I started, which is that if this kind of problem really bothers anyone, start thinking seriously about the idea of a standard library linter. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan

11:49 a.m.

On Sun, May 12, 2013 at 2:48 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Sun, May 12, 2013 at 2:37 AM, Christian Tismer <tismer@stackless.com> wrote:

...
So if there was some notation (not specified yet how) that triggers correct indentation at compile time without extra functional hacks, so that

long_text = """ this text is left justified and this line indents by two spaces """

is stripped the leading and trailing \n and indentation is justified, then I think the need for the implicit whitespace operator would be small.

Through participating in this thread, I've realised that the distinction between when I use a triple quoted string (with or without textwrap.dedent()) and when I use implicit string concatenation is whether or not I want the newlines in the result. Often I can avoid the issue entirely by splitting a statement into multiple pieces, but

... not always. (Sorry, got distracted and left the sentence unfinished). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ian Cordasco

1:18 p.m.

On Sat, May 11, 2013 at 12:48 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Sun, May 12, 2013 at 2:37 AM, Christian Tismer <tismer@stackless.com> wrote:

...
So if there was some notation (not specified yet how) that triggers correct indentation at compile time without extra functional hacks, so that

long_text = """ this text is left justified and this line indents by two spaces """

is stripped the leading and trailing \n and indentation is justified, then I think the need for the implicit whitespace operator would be small.

Through participating in this thread, I've realised that the distinction between when I use a triple quoted string (with or without textwrap.dedent()) and when I use implicit string concatenation is whether or not I want the newlines in the result. Often I can avoid the issue entirely by splitting a statement into multiple pieces, but

I think Guido's right that if we didn't have implicit string concatenation there's no way we would add it ("just use a triple quoted string with escaped newlines" or "just use runtime string concatenation"), but given that we *do* have it, I don't think it's worth the hassle of removing it over a bug that a lint program should be able to pick up.

So I'm back to where I started, which is that if this kind of problem really bothers anyone, start thinking seriously about the idea of a standard library linter.

Really this should be trivial for all of the linters that already exist. That aside, (and this is not an endorsement for this proposal) but can you not just do long_text = """\ this is left justified \ and this is continued on the same line and this is indented by two spaces """ I'm personally in favor of not allowing the concatenation to be on the same line but allowing it across multiple lines. While linters would be great for this, why not just introduce the SyntaxError since (as has already been demonstrated) some of the concatenation already happens at compile time.

Philip Jenvey

4:23 p.m.

On May 10, 2013, at 10:24 PM, Georg Brandl wrote:

...

Am 11.05.2013 01:43, schrieb Philip Jenvey:

...
On May 10, 2013, at 1:09 PM, Michael Foord wrote:

...
On 10 May 2013 20:16, Antoine Pitrou <solipsis@pitrou.net> wrote:

I'm rather -1. It's quite convenient and I don't want to add some '+' signs everywhere I use it. I'm sure many people also have long string literals out there and will have to endure the pain of a dull task to "fix" their code.

However, in your case, foo('a' 'b') could raise a SyntaxWarning, since the "continuation" is on the same line.

I'm with Antoine. I love using implicit concatenation for splitting long literals across multiple lines.

Strongly -1 on this proposal, I also use this quite often.

-1 here. I use it a lot too, and find it very convenient, and while I could live with the change, I think it should have been made together with the lot of other syntax changes going to Python 3.

Also note that it was already proposed and rejected for Python 3. http://www.python.org/dev/peps/pep-3126 -- Philip Jenvey

Alexander Belopolsky

10 May 10 May

2:26 p.m.

On Fri, May 10, 2013 at 2:48 PM, Guido van Rossum <guido@python.org> wrote:

...

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

I had a similar experience just few weeks ago. The bug was in a long list written like this: ['item11', 'item12', ..., 'item17', 'item21', 'item22', ..., 'item27' ... 'item91', 'item92', ..., 'item97'] Clearly the bug crept in when more items were added. (I try to keep redundant commas at the end of the list to avoid this, but not everyone likes this style.)

...

Would it be reasonable to start deprecating this and eventually remove

...

it from the language?

+1, but I would start by requiring () around concatenated strings.

MRAB

2:54 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 10/05/2013 20:26, Alexander Belopolsky wrote:

...

On Fri, May 10, 2013 at 2:48 PM, Guido van Rossum <guido@python.org <mailto:guido@python.org>> wrote:

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

I had a similar experience just few weeks ago. The bug was in a long list written like this:

['item11', 'item12', ..., 'item17', 'item21', 'item22', ..., 'item27' ... 'item91', 'item92', ..., 'item97']

Clearly the bug crept in when more items were added. (I try to keep redundant commas at the end of the list to avoid this, but not everyone likes this style.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

+1, but I would start by requiring () around concatenated strings.

I'm not so sure. Currently, parentheses, brackets and braces effectively make Python ignore a newline within them. (1 +2) is the same as: (1+2) and: [1 +2] is the same as: [1+2] Under the proposal: ("a" "b") or: ("a" "b") would be the same as: ("ab") but: ["a" "b"] or: ["a" "b"] would be a syntax error.

Haoyi Li

3:24 p.m.

+1; I've been bitten by this many times. As already mentioned, one big use case where this is useful is having multiline string literals without having all the annoying indentation leak into your code. I think this could be easily fixed with a convenient .dedent() or .strip_margin() function. On Fri, May 10, 2013 at 3:54 PM, MRAB <python@mrabarnett.plus.com> wrote:

...

On 10/05/2013 20:26, Alexander Belopolsky wrote:

...
On Fri, May 10, 2013 at 2:48 PM, Guido van Rossum <guido@python.org <mailto:guido@python.org>> wrote:

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

I had a similar experience just few weeks ago. The bug was in a long list written like this:

['item11', 'item12', ..., 'item17', 'item21', 'item22', ..., 'item27' ... 'item91', 'item92', ..., 'item97']

Clearly the bug crept in when more items were added. (I try to keep redundant commas at the end of the list to avoid this, but not everyone likes this style.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

+1, but I would start by requiring () around concatenated strings.

I'm not so sure.

Currently, parentheses, brackets and braces effectively make Python ignore a newline within them.

(1 +2)

is the same as:

(1+2)

and:

[1 +2]

is the same as:

[1+2]

Under the proposal:

("a" "b")

or:

("a" "b")

would be the same as:

("ab")

but:

["a" "b"]

or:

["a" "b"]

would be a syntax error.

______________________________**_________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>

Ezio Melotti

3:40 p.m.

On Fri, May 10, 2013 at 10:54 PM, MRAB <python@mrabarnett.plus.com> wrote:

...

Under the proposal:

("a" "b")

or:

("a" "b")

would be the same as:

("ab")

but:

["a" "b"]

or:

["a" "b"]

would be a syntax error.

This would actually be fine with me. I use implicit string literal concatenation mostly within (...), and even though I've seen (and sometimes written) code like ['this is a ' 'long string', 'this is another ' 'long string'] I agree that requiring extra (...) in this case is reasonable, i.e.: [('this is a ' 'long string'), ('this is another ' 'long string')] The same would apply to other literals like {...} (for both sets and dicts), and possibly for tuples too (assuming that it's possible to figure out when a tuple is being created). I also write code like: raise SomeException('this is a long message ' 'that spans 2 lines') or even: self.assertTrue(somefunc(), 'somefunc() returned ' 'a false value and this is wrong') In these cases I wouldn't like redundant (...) (or even worse extra '+'s), especially for the first case. I also think that forgetting a comma in a list of function args between two string literal args is quite uncommon, whereas forgetting it in a sequence of strings (list, set, dict, tuple) is much more common, so this approach should cover most of the cases. Best Regards, Ezio Melotti

Serhiy Storchaka

4:12 p.m.

10.05.13 23:40, Ezio Melotti написав(ла):

...

I also think that forgetting a comma in a list of function args between two string literal args is quite uncommon, whereas forgetting it in a sequence of strings (list, set, dict, tuple) is much more common, so this approach should cover most of the cases.

Tuples.

Antonio Messina

4:17 p.m.

My 2 cents: as an user, I often split very long text lines (mostly log entries or exception messages) into multiple lines in order to stay under 80chars (PEP8 docet), like: log.warning("Configuration item '%s' was renamed to '%s'," " please change occurrences of '%s' to '%s'" " in configuration file '%s'.", oldkey, newkey, oldkey, newkey, filename) This should become (if I understand the proposal) something like: log.warning("Configuration item '%s' was renamed to " % oldkey + "'%s', please change occurrences of '%s'" % (newkey, oldkey) + " to '%s' in configuration file '%s'." % (newkey, filename)) but imagine what would happen if you have to rephrase the text, and reorder the variables and fix the `+` signs... On the other hands, I think I've only got the ``func("a" "b")`` error once or twice in my life. .a. -- antonio.s.messina@gmail.com antonio.messina@uzh.ch +41 (0)44 635 42 22 GC3: Grid Computing Competence Center http://www.gc3.uzh.ch/ University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Switzerland

Ian Cordasco

4:53 p.m.

On Fri, May 10, 2013 at 5:17 PM, Antonio Messina <antonio.s.messina@gmail.com> wrote:

...

My 2 cents: as an user, I often split very long text lines (mostly log entries or exception messages) into multiple lines in order to stay under 80chars (PEP8 docet), like:

log.warning("Configuration item '%s' was renamed to '%s'," " please change occurrences of '%s' to '%s'" " in configuration file '%s'.", oldkey, newkey, oldkey, newkey, filename)

Actually it would just become log.warning(("Configuration item '%s' was renamed to '%s'," + " please change occurrences of '%s' to '%s'" + " in configuration file '%s'."), oldkey, newkey, oldkey, newkey, filename) Perhaps without the inner set of parentheses. The issue of string formatting wouldn't apply here since log.* does the formatting for you. A more apt example of what they were talking about earlier is s = ("foo %s bar bogus" % (var1) "spam %s spam %s spam" % (var2, var3)) Would have to become s = (("foo %s bar bogus" % (var1)) + ("spam %s spam %s spam" % (var2, var3))) Because + has operator precedence over % otherwise, var1 would be concatenated with "spam %s spam %s spam" and then you would have substitution take place.

Antonio Messina

5 p.m.

On Fri, May 10, 2013 at 11:53 PM, Ian Cordasco <graffatcolmingov@gmail.com> wrote:

...

On Fri, May 10, 2013 at 5:17 PM, Antonio Messina <antonio.s.messina@gmail.com> wrote:

...
My 2 cents: as an user, I often split very long text lines (mostly log entries or exception messages) into multiple lines in order to stay under 80chars (PEP8 docet), like:

log.warning("Configuration item '%s' was renamed to '%s'," " please change occurrences of '%s' to '%s'" " in configuration file '%s'.", oldkey, newkey, oldkey, newkey, filename)

Actually it would just become

log.warning(("Configuration item '%s' was renamed to '%s'," + " please change occurrences of '%s' to '%s'" + " in configuration file '%s'."), oldkey, newkey, oldkey, newkey, filename)

Perhaps without the inner set of parentheses. The issue of string formatting wouldn't apply here since log.* does the formatting for you. A more apt example of what they were talking about earlier is

You are right, I've picked up the wrong example. Please rephrase it using "raise SomeException()" instead of "log.warning()", which is the other case I often have to split the string over multiple lines: raise ConfigurationError("Configuration tiem '%s' was renamed to '%s'," " please change occurrences of '%s' to '%s'" " in configuration file '%s'." % (oldkey, newkey, oldkey, newkey, filename)) .a. -- antonio.s.messina@gmail.com antonio.messina@uzh.ch +41 (0)44 635 42 22 GC3: Grid Computing Competence Center http://www.gc3.uzh.ch/ University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Switzerland

Stefan Behnel

11 May 11 May

4:42 a.m.

Ezio Melotti, 10.05.2013 22:40:

...

['this is a ' 'long string', 'this is another ' 'long string'] I agree that requiring extra (...) in this case is reasonable, i.e.: [('this is a ' 'long string'), ('this is another ' 'long string')]

-1, IMHO this makes it more verbose and thus harder to read, because it takes a while to figure out that the parentheses are not meant to surround tuples in this case, which would be the one obvious reason to spot them inside of a list. In a way, it's the reverse of the "spot the missing comma" problem, more like a "spot that there's really no comma". That's just as bad, if you ask me. Stefan

Ben Darnell

13 May 13 May

10:20 p.m.

On Fri, May 10, 2013 at 4:40 PM, Ezio Melotti <ezio.melotti@gmail.com>wrote:

...

On Fri, May 10, 2013 at 10:54 PM, MRAB <python@mrabarnett.plus.com> wrote:I also think that forgetting a comma in a list of function args between two string literal args is quite uncommon, whereas forgetting it in a sequence of strings (list, set, dict, tuple) is much more common, so this approach should cover most of the cases.

This is my experience as well. When I've run into problems by forgetting a comma it's nearly always been in a list, not in function arguments. (and it's never been between two items on the same line, so the proposal in one of the subthreads here to disallow implicit concatenation only between two strings on the same line wouldn't help much). The problem is that in other languages, a trailing comma is forbidden, while in python it is optional. This means that lists like [ 1, 2, 3, ] may or may not have a comma after the third element. The comma is there often enough that you can fall out of the habit of checking for it when you extend the list. The most pythonic solution is therefore to follow the example of the single-element tuple and make the trailing comma mandatory ;) -Ben

Juancarlo Añez

14 May 14 May

6:10 a.m.

On Mon, May 13, 2013 at 10:50 PM, Ben Darnell <ben@bendarnell.com> wrote:

...

[ 1, 2, 3, ]

Ouch! -- Juancarlo *Añez*

Eli Bendersky

10 May 10 May

6:27 p.m.

On Fri, May 10, 2013 at 11:48 AM, Guido van Rossum <guido@python.org> wrote:

...

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

I would also be happy to see this error-prone syntax go (was bitten by it a number of times in the past), but I have a practical question here: Realistically, what does "start deprecating" and "eventually remove" means here? This is a significant backwards-compatibility breaking change that will *definitely* break existing code. So would it be removed just in "Python 4"? Or are you talking about an actual 3.x release like "deprecate in 3.4 and remove in 3.5" ? Eli

Guido van Rossum

6:41 p.m.

On Fri, May 10, 2013 at 4:27 PM, Eli Bendersky <eliben@gmail.com> wrote:

...

On Fri, May 10, 2013 at 11:48 AM, Guido van Rossum <guido@python.org> wrote:

...
I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

I would also be happy to see this error-prone syntax go (was bitten by it a number of times in the past), but I have a practical question here:

Realistically, what does "start deprecating" and "eventually remove" means here? This is a significant backwards-compatibility breaking change that will *definitely* break existing code. So would it be removed just in "Python 4"? Or are you talking about an actual 3.x release like "deprecate in 3.4 and remove in 3.5" ?

It's probably common enough that we'd have to do a silent deprecation in 3.4, a nosy deprecation in 3.5, and then delete it in 3.6, or so. Or maybe even more conservative. Plus we should work on a conversion tool that adds + and () as needed, *and* tell authors of popular lint tools to add rules for this. The hacky proposals for making it a syntax warning "sometimes" don't feel right to me. I do realize that this will break a lot of code, and that's the only reason why we may end up punting on this, possibly until Python 4, or forever. But I don't think the feature is defensible from a language usability POV. It's just about backward compatibility at this point. -- --Guido van Rossum (python.org/~guido)

Nick Coghlan

6:51 p.m.

On 11 May 2013 04:50, "Guido van Rossum" <guido@python.org> wrote:

...

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

I could live with it if we get "dedent()" as a string method. I'd be even happier if constant folding was extended to platform independent method calls on literals, but I don't believe there's a sane way to maintain the "platform independent" constraint. OTOH, it's almost on the scale of "remove string mod formatting". Shipping at least a basic linting tool in the stdlib would probably be almost as effective and substantially less disruptive. lib2to3 should provide some decent infrastructure for that. Cheers, Nick.

...

-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Guido van Rossum

6:57 p.m.

Well, I think if you can live with x = ('foo\n' 'bar\n' 'baz\n' ) I think you could live with x = ('foo\n' + 'bar\n' + 'baz\n' ) as well... (Extra points if you figure out how to have a + on the last line too. :-) So, as I said, it's not the convenience that matters, it's how much it is in use. :-( --Guido On Fri, May 10, 2013 at 4:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 11 May 2013 04:50, "Guido van Rossum" <guido@python.org> wrote:

...
I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

I could live with it if we get "dedent()" as a string method. I'd be even happier if constant folding was extended to platform independent method calls on literals, but I don't believe there's a sane way to maintain the "platform independent" constraint.

OTOH, it's almost on the scale of "remove string mod formatting". Shipping at least a basic linting tool in the stdlib would probably be almost as effective and substantially less disruptive. lib2to3 should provide some decent infrastructure for that.

Cheers, Nick.

...
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- --Guido van Rossum (python.org/~guido)

Alexander Belopolsky

7:08 p.m.

On Fri, May 10, 2013 at 7:57 PM, Guido van Rossum <guido@python.org> wrote:

...

I think you could live with

x = ('foo\n' + 'bar\n' + 'baz\n' )

as well... (Extra points if you figure out how to have a + on the last line too. :-)

Does this earn a point? x = (+ 'foo\n' + 'bar\n' + 'baz\n' ) :-))

Michael Mitchell

7:36 p.m.

On Fri, May 10, 2013 at 7:08 PM, Alexander Belopolsky < alexander.belopolsky@gmail.com> wrote:

...

Does this earn a point?

x = (+ 'foo\n'

+ 'bar\n' + 'baz\n' )

Plus doesn't make sense as a unary operator on strings. x = ('foo\n' + 'bar\n' + 'baz\n' + '') This would work.

Ethan Furman

20 May 20 May

8:26 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 05/10/2013 05:36 PM, Michael Mitchell wrote:

...

On Fri, May 10, 2013 at 7:08 PM, Alexander Belopolsky wrote:

Does this earn a point?

x = (+ 'foo\n'

+ 'bar\n' + 'baz\n' )

Plus doesn't make sense as a unary operator on strings.

x = ('foo\n' + 'bar\n' + 'baz\n' + '')

This would work.

Except your last line is now an empty string, and still with no trailing +. -- ~Ethan~

Ian Cordasco

10 May 10 May

6:57 p.m.

On May 10, 2013 7:51 PM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:

...

On 11 May 2013 04:50, "Guido van Rossum" <guido@python.org> wrote:

...
I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

I could live with it if we get "dedent()" as a string method. I'd be even

happier if constant folding was extended to platform independent method calls on literals, but I don't believe there's a sane way to maintain the "platform independent" constraint.

...

OTOH, it's almost on the scale of "remove string mod formatting".

Shipping at least a basic linting tool in the stdlib would probably be almost as effective and substantially less disruptive. lib2to3 should provide some decent infrastructure for that. I have cc'd the code-quality mailing list since several linger authors are subscribed there.

Bruce Leban

7:29 p.m.

I got bit by this quite recently, leaving out a comma in a long list of strings and I only found the bug by accident. This being python "ideas" I'll throw one out. Add another prefix character to strings: a = [m'abc' 'def'] # equivalent to ['abcdef'] A string with an m prefix is continued on one or more following lines. A string must have an m prefix to be continued (but this change would have to be phased in). A conversion tool need merely recognize the string continuations and insert m's. I chose the m character for multi-line but the character choice is available for bikeshedding. The m prefix can be combined with u and/or r but not with triple-quotes. The following are not allowed: b = ['abc' # syntax error (m is required for continuation) 'def') c = [m'abc'] # syntax error (when m is used, continuation lines must be present) d = [m'abc' m'def'] # syntax error (m only allowed for first string) The reason to prohibit cases c and d guard against comma errors with these forms. Consider these cases with missing or extra commas. e = [m'abc', # extra comma causes syntax error 'def'] f = [m'abc' # missing comma causes syntax error m'def', 'ghi'] Yes, I know this doesn't guard against all comma errors. You could protect against more with prefix and suffix (e.g., an m at the end of the last string) but I'm skeptical it's worth it. Conversion to this could be done in three stages: (1) accept m's (case a), deprecate missing m's (case b), error for misused m's (case c-f) (2) warn on missing m's (case b) (3) error on missing m's (case b) --- Bruce Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security

MRAB

8:12 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 11/05/2013 01:29, Bruce Leban wrote:

...

I got bit by this quite recently, leaving out a comma in a long list of strings and I only found the bug by accident.

This being python "ideas" I'll throw one out.

Add another prefix character to strings:

a = [m'abc' 'def'] # equivalent to ['abcdef']

A string with an m prefix is continued on one or more following lines. A string must have an m prefix to be continued (but this change would have to be phased in). A conversion tool need merely recognize the string continuations and insert m's. I chose the m character for multi-line but the character choice is available for bikeshedding. The m prefix can be combined with u and/or r but not with triple-quotes. The following are not allowed:

b = ['abc' # syntax error (m is required for continuation) 'def')

c = [m'abc'] # syntax error (when m is used, continuation lines must be present)

d = [m'abc' m'def'] # syntax error (m only allowed for first string)

The reason to prohibit cases c and d guard against comma errors with these forms. Consider these cases with missing or extra commas.

e = [m'abc', # extra comma causes syntax error 'def']

f = [m'abc' # missing comma causes syntax error m'def', 'ghi']

Yes, I know this doesn't guard against all comma errors. You could protect against more with prefix and suffix (e.g., an m at the end of the last string) but I'm skeptical it's worth it.

Conversion to this could be done in three stages:

(1) accept m's (case a), deprecate missing m's (case b), error for misused m's (case c-f) (2) warn on missing m's (case b) (3) error on missing m's (case b)

It wouldn't help with: f = [m'abc' 'def' 'ghi'] vs: f = [m'abc' 'def', 'ghi'] I think I'd go more for a triple-quoted string with a prefix for dedenting and removing newlines: f = [m''' abc def ghi '''] where f == ['abcdefghi'].

Joao S. O. Bueno

9:13 p.m.

Any chance that along with that there could come up a syntax for ignoring identation space inside multiline strings along wih this deprecation? Think something along: logger.warn(I'''C: 'Ello, Miss? Owner: What do you mean "miss"? C: I'm sorry, I have a cold. I wish to make a complaint! O: We're closin' for lunch. C: Never mind that, my lad. I wish to complain about this parrot what I purchased not half an hour ago from this very boutique.\ ''') Against: logger.warn( 'Owner: What do you mean "miss"?\n' + 'C: I'm sorry, I have a cold. I wish to make a complaint!\n' + 'O: We're closin\' for lunch.\n' + 'C: Never mind that, my lad. I wish to complain about this\n' + 'parrot what I purchased not half an hour ago from this very boutique.\n' ) I know this sugestion has come and gone before - but it still looks like a good idea for me - there is no ambiguity there - you either punch enough spaces to get your content aligned with the " i''' "in the first line, or get a SyntaxError.

Stephen J. Turnbull

10:31 p.m.

New subject: Implicit string literal concatenation considered harmful?

MRAB writes:

...

I think I'd go more for a triple-quoted string with a prefix for dedenting and removing newlines:

f = [m''' abc def ghi ''']

where f == ['abcdefghi'].

Cool enough, but

...

...
...
f = [m''' ... abc ... def ... ghi ... '''] f == ['abc def ghi'] True

Worse,

...

...
...
f = [m''' ... abc ... def ... ghi ... '''] f == ['abc def ghi'] True

Yikes! (Yeah, I know about consenting adults.)

Joao S. O. Bueno

11 May 11 May

9:21 a.m.

Please - check my e-mail correctly On 11 May 2013 00:31, Stephen J. Turnbull <stephen@xemacs.org> wrote:

...

MRAB writes:

...
I think I'd go more for a triple-quoted string with a prefix for dedenting and removing newlines:

f = [m''' abc def ghi ''']

I think the prefix idea is obvious - and I used the letter "i" in my message - for "idented" -0 it may be a pooorr choice indeed since it looks like it may not be noticed sometimes close to the quotes.

...

...
where f == ['abcdefghi'].

Cool enough, but

...
...
...
f = [m''' ... abc ... def ... ghi ... '''] f == ['abc def ghi'] True

In my porposal, this woukld yield a Syntax Error - any contents of the string would have to be indented to the same level of the prefix. Sorry if that was not clear enough.

...

Worse,

...
...
...
f = [m''' ... abc ... def ... ghi ... '''] f == ['abc def ghi'] True

Yikes! (Yeah, I know about consenting adults.) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Nick Coghlan

10 May 10 May

10:37 p.m.

On Sat, May 11, 2013 at 10:29 AM, Bruce Leban <bruce@leapyear.org> wrote:

...

I got bit by this quite recently, leaving out a comma in a long list of strings and I only found the bug by accident.

This being python "ideas" I'll throw one out.

Add another prefix character to strings:

a = [m'abc' 'def'] # equivalent to ['abcdef']

As MRAB suggested, a prefix for a compile time dedent would likely be more useful - then you'd just use a triple quoted string and be done with it. The other one I occasionally wish for is a compile time equivalent of str.split (if we had that, we likely wouldn't see APIs like collections.namedtuple and enum.Enum accepting space separated strings). Amongst my ideas-so-farfetched-I-never-even-wrote-them-up (which is saying something, given some of the ideas I *have* written up) is a notation like: !processor!"STRING LITERAL" Where the compile time string processors had to be registered through an appropriate API (probably in the sys module). Then you would just define preprocessors like "merge" or "dedent" or "split" or "sh" of "format" and get the appropriate compile time raw string->AST translation. So for this use case, you would do: a = [!merge!"""\ abc def""" Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Andrew Barnert

11 May 11 May

12:12 a.m.

On May 10, 2013, at 20:37, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Sat, May 11, 2013 at 10:29 AM, Bruce Leban <bruce@leapyear.org> wrote:

...
I got bit by this quite recently, leaving out a comma in a long list of strings and I only found the bug by accident.

This being python "ideas" I'll throw one out.

Add another prefix character to strings:

a = [m'abc' 'def'] # equivalent to ['abcdef']

As MRAB suggested, a prefix for a compile time dedent would likely be more useful - then you'd just use a triple quoted string and be done with it. The other one I occasionally wish for is a compile time equivalent of str.split (if we had that, we likely wouldn't see APIs like collections.namedtuple and enum.Enum accepting space separated strings).

Why does it need to be compile time? Do people really run into cases that frequently where the cost of concatenating or dedenting strings at import time is significant? If so, it seems like something more dramatic might be warranted, like allowing the compiler to assume that method calls on literals have the same effect at compile time as at runtime so it can turn them into constants. (Doesn't the + optimization already make that assumption anyway?)

Steven D'Aprano

12:53 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 11/05/13 15:12, Andrew Barnert wrote:

...

Why does it need to be compile time? Do people really run into cases that frequently where the cost of concatenating or dedenting strings at import time is significant?

String constants do not need to be concatenated only at import time. Strings frequently need to be concatenated at run-time, or at function call time, or inside loops. For constants known at compile time, it is better to use a string literal rather than a string calculated at run-time for the same reason that it is better to write 2468 rather than 2000+400+60+8 -- because it better reflects the way we think about the program, not just because of the run-time expense of extra unnecessary additions/concatenations.

...

If so, it seems like something more dramatic might be warranted, like allowing the compiler to assume that method calls on literals have the same effect at compile time as at runtime so it can turn them into constants.

In principle, the keyhole optimizer could make that assumption. In practice, there is a limit to how much effort people put into the optimizer. Constant-folding method calls is probably past the point of diminishing returns. -- Steven

Andrew Barnert

2:13 a.m.

On May 10, 2013, at 22:53, Steven D'Aprano <steve@pearwood.info> wrote:

...

On 11/05/13 15:12, Andrew Barnert wrote:

...
Why does it need to be compile time? Do people really run into cases that frequently where the cost of concatenating or dedenting strings at import time is significant?

String constants do not need to be concatenated only at import time.

Strings frequently need to be concatenated at run-time, or at function call time, or inside loops. For constants known at compile time, it is better to use a string literal rather than a string calculated at run-time for the same reason that it is better to write 2468 rather than 2000+400+60+8 -- because it better reflects the way we think about the program, not just because of the run-time expense of extra unnecessary additions/concatenations.

Well, you have the choice of either: count = 2000 + 400 + 60 + 8 for e in hugeiter: foo(e, count) Or: for e in hugeiter: foo(e, 2468) # 2000 + 400 + 60 + 8 And again, considering that the whole point of string concatenation is dealing with cases that are hard to fit into 80 cols otherwise, the former option is, if anything, even more appropriate.

...

...
If so, it seems like something more dramatic might be warranted, like allowing the compiler to assume that method calls on literals have the same effect at compile time as at runtime so it can turn them into constants.

In principle, the keyhole optimizer could make that assumption. In practice, there is a limit to how much effort people put into the optimizer. Constant-folding method calls is probably past the point of diminishing returns.

Adding new optimizations just for the hell of it is obviously not a good idea. But we're talking about the cost of adding an optimization to vs. adding a new type of auto-dedenting string literal. It seems like about the same scope either way, and the former doesn't require any changes to the grammar, docs, other implementations, etc.--or, more importantly, existing user code. And it might even improve other related cases. If the problem is so important we're seriously considering changing the syntax, it seems a little unwarranted to reject the optimization out of hand. Or, contrarily, if the optimization is obviously not worth doing, changing the syntax to let people do the same optimization manually seems excessive.

Stefan Behnel

5 a.m.

Steven D'Aprano, 11.05.2013 07:53:

...

In principle, the keyhole optimizer could make that assumption. In practice, there is a limit to how much effort people put into the optimizer. Constant-folding method calls is probably past the point of diminishing returns.

Plus, such an optimisation can have a downside. Contrived example: if DEBUG: print('a'.replace('a', 'aaaaaaaa').replace('a', 'aaaaaaaa') .replace('a', 'aaaaaaaa').replace('a', 'aaaaaaaa') .replace('a', 'aaaaaaaa').replace('a', 'aaaaaaaa')) Expanding this into a string literal will trade space for time, whereas the original code clearly trades time for space. The same applies to string splitting. A list of many short strings takes up more space than a split call on one large string. May not seem like a major concern in most cases that involve string literals, but we shouldn't ignore the possibility that the author of the code might have used the explicit method call quite deliberately. Stefan

Serhiy Storchaka

9:12 a.m.

11.05.13 13:00, Stefan Behnel написав(ла):

...

Plus, such an optimisation can have a downside. Contrived example:

if DEBUG: print('a'.replace('a', 'aaaaaaaa').replace('a', 'aaaaaaaa') .replace('a', 'aaaaaaaa').replace('a', 'aaaaaaaa') .replace('a', 'aaaaaaaa').replace('a', 'aaaaaaaa'))

Expanding this into a string literal will trade space for time, whereas the original code clearly trades time for space. The same applies to string splitting. A list of many short strings takes up more space than a split call on one large string.

May not seem like a major concern in most cases that involve string literals, but we shouldn't ignore the possibility that the author of the code might have used the explicit method call quite deliberately.

x = 0 if x: x = 9**9**9

MRAB

11:55 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 11/05/2013 04:37, Nick Coghlan wrote:

...

On Sat, May 11, 2013 at 10:29 AM, Bruce Leban <bruce@leapyear.org> wrote:

...
I got bit by this quite recently, leaving out a comma in a long list of strings and I only found the bug by accident.

This being python "ideas" I'll throw one out.

Add another prefix character to strings:

a = [m'abc' 'def'] # equivalent to ['abcdef']

As MRAB suggested, a prefix for a compile time dedent would likely be more useful - then you'd just use a triple quoted string and be done with it. The other one I occasionally wish for is a compile time equivalent of str.split (if we had that, we likely wouldn't see APIs like collections.namedtuple and enum.Enum accepting space separated strings).

Amongst my ideas-so-farfetched-I-never-even-wrote-them-up (which is saying something, given some of the ideas I *have* written up) is a notation like:

!processor!"STRING LITERAL"

Where the compile time string processors had to be registered through an appropriate API (probably in the sys module). Then you would just define preprocessors like "merge" or "dedent" or "split" or "sh" of "format" and get the appropriate compile time raw string->AST translation.

So for this use case, you would do:

a = [!merge!"""\ abc def"""

Do you really need the "!"? String literals can already have a prefix, such as "r". At compile time, the string literal could be preprocessed according to its prefix (some kind of import hook, working on the AST?). The current prefixes are "" (plain literal), "r", "b", "u", etc.

Nick Coghlan

1:51 p.m.

On Sun, May 12, 2013 at 2:55 AM, MRAB <python@mrabarnett.plus.com> wrote:

...

Do you really need the "!"? String literals can already have a prefix, such as "r".

At compile time, the string literal could be preprocessed according to its prefix (some kind of import hook, working on the AST?). The current prefixes are "" (plain literal), "r", "b", "u", etc.

1. Short prefixes are inherently cryptic (especially single letter ones) 2. The existing prefixes control how the source code is converted to a string, they don't permit conversion to a completely different construct 3. Short prefixes are not extensible and rapidly run into namespacing issues As noted, I prefer not to solve this problem at all (and add a basic lint capability instead). However, if we do try to solve it, then I'd prefer a syntax that adds a general extensible capability rather than one that piles additional complications on the existing string prefix mess. If we support dedent, do we also support merging adjacent whitespace characters into a single string? Do we support splitting a string? Do we support upper case or lower case or taking its length? Two responses make sense to me: accept the status quo (perhaps with linter support), or design and champion a general compile time string processing capability (that doesn't rely on encoding tricks or a custom import hook). Expanding on the already cryptic string prefix system does *not* strike me as a reasonable idea at all. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Christian Tismer

12:05 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 11.05.13 05:37, Nick Coghlan wrote:

...

...
I got bit by this quite recently, leaving out a comma in a long list of strings and I only found the bug by accident.

This being python "ideas" I'll throw one out.

Add another prefix character to strings:

a = [m'abc' 'def'] # equivalent to ['abcdef'] As MRAB suggested, a prefix for a compile time dedent would likely be more useful - then you'd just use a triple quoted string and be done with it. The other one I occasionally wish for is a compile time equivalent of str.split (if we had that, we likely wouldn't see APIs

On Sat, May 11, 2013 at 10:29 AM, Bruce Leban <bruce@leapyear.org> wrote: like collections.namedtuple and enum.Enum accepting space separated strings).

Amongst my ideas-so-farfetched-I-never-even-wrote-them-up (which is saying something, given some of the ideas I *have* written up) is a notation like:

!processor!"STRING LITERAL"

Where the compile time string processors had to be registered through an appropriate API (probably in the sys module). Then you would just define preprocessors like "merge" or "dedent" or "split" or "sh" of "format" and get the appropriate compile time raw string->AST translation.

So for this use case, you would do:

a = [!merge!"""\ abc def"""

Ah, I see we are on the same path here. Just not sure if it is right to move into a compile-time preprocessor language or to just handle the most common cases with a simple prefix? One example is code snippets which need proper de-indentation. I think a simple stripping of white-space in text = s""" leftmost column two-char indent """ would solve 95 % of common indentation and concatenation cases. I don't think provision for merging is needed very often. If text occurs deeply nested in code, then it is also quite likely to be part of an expression, anyway. My major use-case is text constants in a class or function that is multiple lines long and should be statically ready to use without calling a function. (here an 's' as a strip prefix, but I'm not sold on that) cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

M.-A. Lemburg

12:24 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 11.05.2013 19:05, Christian Tismer wrote:

...

I think a simple stripping of white-space in

text = s""" leftmost column two-char indent """

would solve 95 % of common indentation and concatenation cases. I don't think provision for merging is needed very often. If text occurs deeply nested in code, then it is also quite likely to be part of an expression, anyway. My major use-case is text constants in a class or function that is multiple lines long and should be statically ready to use without calling a function.

(here an 's' as a strip prefix, but I'm not sold on that)

This is not a good solution for long lines where you don't want to have embedded line endings. Taken from existing code: _litmonth = ('(?P<litmonth>' 'jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec|' 'mär|mae|mrz|mai|okt|dez|' 'fev|avr|juin|juil|aou|aoû|déc|' 'ene|abr|ago|dic|' 'out' ')[a-z,\.;]*') or raise errors.DataError( 'Inconsistent revenue item currency: ' 'transaction=%r; transaction_position=%r' % (transaction, transaction_position)) We usually try to keep the code line length under 80 chars, so splitting literals in that way is rather common, esp. in nested code paths. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 11 2013)

...

...
...
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

Christian Tismer

1:37 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 11.05.13 19:24, M.-A. Lemburg wrote:

...

On 11.05.2013 19:05, Christian Tismer wrote:

...
I think a simple stripping of white-space in

text = s""" leftmost column two-char indent """

would solve 95 % of common indentation and concatenation cases. I don't think provision for merging is needed very often. If text occurs deeply nested in code, then it is also quite likely to be part of an expression, anyway. My major use-case is text constants in a class or function that is multiple lines long and should be statically ready to use without calling a function.

(here an 's' as a strip prefix, but I'm not sold on that) This is not a good solution for long lines where you don't want to have embedded line endings. Taken from existing code:

_litmonth = ('(?P<litmonth>' 'jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec|' 'mär|mae|mrz|mai|okt|dez|' 'fev|avr|juin|juil|aou|aoû|déc|' 'ene|abr|ago|dic|' 'out' ')[a-z,\.;]*')

or raise errors.DataError( 'Inconsistent revenue item currency: ' 'transaction=%r; transaction_position=%r' % (transaction, transaction_position))

We usually try to keep the code line length under 80 chars, so splitting literals in that way is rather common, esp. in nested code paths.

Your first example is a regex, which could be used as-is. Your second example is indented five levels deep. That is a coding style which I would propose to write differently for better readability. And if you stick with it, why not use the "+"? I want to support constant strings, which should not be somewhere in the middle of code. Your second example is computed, anyway, not the case that I want to solve. cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

M.-A. Lemburg

4:14 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 11.05.2013 20:37, Christian Tismer wrote:

...

On 11.05.13 19:24, M.-A. Lemburg wrote:

...
On 11.05.2013 19:05, Christian Tismer wrote:

...
I think a simple stripping of white-space in

text = s""" leftmost column two-char indent """

would solve 95 % of common indentation and concatenation cases. I don't think provision for merging is needed very often. If text occurs deeply nested in code, then it is also quite likely to be part of an expression, anyway. My major use-case is text constants in a class or function that is multiple lines long and should be statically ready to use without calling a function.

(here an 's' as a strip prefix, but I'm not sold on that) This is not a good solution for long lines where you don't want to have embedded line endings. Taken from existing code:

_litmonth = ('(?P<litmonth>' 'jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec|' 'mär|mae|mrz|mai|okt|dez|' 'fev|avr|juin|juil|aou|aoû|déc|' 'ene|abr|ago|dic|' 'out' ')[a-z,\.;]*')

or raise errors.DataError( 'Inconsistent revenue item currency: ' 'transaction=%r; transaction_position=%r' % (transaction, transaction_position))

We usually try to keep the code line length under 80 chars, so splitting literals in that way is rather common, esp. in nested code paths.

Your first example is a regex, which could be used as-is.

Your second example is indented five levels deep. That is a coding style which I would propose to write differently for better readability. And if you stick with it, why not use the "+"?

I want to support constant strings, which should not be somewhere in the middle of code. Your second example is computed, anyway, not the case that I want to solve.

You're not addressing the main point I was trying to make :-) Triple-quoted strings work for strings that are supposed to have embedded newlines, but they don't provide a good alternative for long strings without embedded newlines. Regarding using '+' in these cases: of course that would be possible, but it clutters up the code, often requires additional parens, it's slower and can lead to other weird errors when forgetting parens, which are not much different than the one Guido mentioned in his original email. In all the years I've been writing Python, I've only very rarely had an issue with missing commas between strings. Most cases I ran into were missing commas in lists of tuples, not strings: l = [ 'detect_target_type', (None, Is, '"', +1, 'double_quoted_target') (None, Is, '\'', +1, 'single_quoted_target'), (None, IsIn, separators, 'unquoted_target', 'empty_target'), ] This gives: Traceback (most recent call last): File "<stdin>", line 4, in <module> TypeError: 'tuple' object is not callable :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 11 2013)

...

...
...
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

Ethan Furman

20 May 20 May

12:07 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 05/11/2013 11:37 AM, Christian Tismer wrote:

...

On 11.05.13 19:24, M.-A. Lemburg wrote:

...
On 11.05.2013 19:05, Christian Tismer wrote:

...
I think a simple stripping of white-space in

text = s""" leftmost column two-char indent """

would solve 95 % of common indentation and concatenation cases. I don't think provision for merging is needed very often. If text occurs deeply nested in code, then it is also quite likely to be part of an expression, anyway. My major use-case is text constants in a class or function that is multiple lines long and should be statically ready to use without calling a function.

(here an 's' as a strip prefix, but I'm not sold on that) This is not a good solution for long lines where you don't want to have embedded line endings. Taken from existing code:

_litmonth = ('(?P<litmonth>' 'jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec|' 'mär|mae|mrz|mai|okt|dez|' 'fev|avr|juin|juil|aou|aoû|déc|' 'ene|abr|ago|dic|' 'out' ')[a-z,\.;]*')

or raise errors.DataError( 'Inconsistent revenue item currency: ' 'transaction=%r; transaction_position=%r' % (transaction, transaction_position))

Your first example is a regex, which could be used as-is.

If implicit string concatenation goes away, how can the regex be used as-is?

...

Your second example is indented five levels deep. That is a coding style which I would propose to write differently for better readability. And if you stick with it, why not use the "+"?

I want to support constant strings, which should not be somewhere in the middle of code. Your second example is computed, anyway, not the case that I want to solve.

You may not want to solve it, but it needs solving if ISC goes away. -- ~Ethan~

Mark Dickinson

14 May 14 May

12:24 p.m.

On Sat, May 11, 2013 at 6:24 PM, M.-A. Lemburg <mal@egenix.com> wrote:

...

On 11.05.2013 19:05, Christian Tismer wrote:

...
I think a simple stripping of white-space in

text = s""" leftmost column two-char indent """

would solve 95 % of common indentation and concatenation cases. <snipped>

...

This is not a good solution for long lines where you don't want to have embedded line endings. Taken from existing code:

_litmonth = ('(?P<litmonth>' 'jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec|' 'mär|mae|mrz|mai|okt|dez|' 'fev|avr|juin|juil|aou|aoû|déc|' 'ene|abr|ago|dic|' 'out' ')[a-z,\.;]*')

or raise errors.DataError( 'Inconsistent revenue item currency: ' 'transaction=%r; transaction_position=%r' % (transaction, transaction_position))

Agreed. I use the implicit concatenation a lot for exception messages like the one above; we also tend to keep line length to 80 characters *and* use nice verbose exception messages. I could live with adding the extra '+' characters and parentheses, but I think it would be a net loss of readability. The _litmonth example looks like a candidate for re.VERBOSE and a triple-quoted string, though. Mark

M.-A. Lemburg

12:43 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 14.05.2013 19:24, Mark Dickinson wrote:

...

On Sat, May 11, 2013 at 6:24 PM, M.-A. Lemburg <mal@egenix.com> wrote:

...
On 11.05.2013 19:05, Christian Tismer wrote:

...
I think a simple stripping of white-space in

text = s""" leftmost column two-char indent """

would solve 95 % of common indentation and concatenation cases. <snipped>

...
This is not a good solution for long lines where you don't want to have embedded line endings. Taken from existing code:

_litmonth = ('(?P<litmonth>' 'jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec|' 'mär|mae|mrz|mai|okt|dez|' 'fev|avr|juin|juil|aou|aoû|déc|' 'ene|abr|ago|dic|' 'out' ')[a-z,\.;]*')

or raise errors.DataError( 'Inconsistent revenue item currency: ' 'transaction=%r; transaction_position=%r' % (transaction, transaction_position))

Agreed. I use the implicit concatenation a lot for exception messages like the one above; we also tend to keep line length to 80 characters *and* use nice verbose exception messages. I could live with adding the extra '+' characters and parentheses, but I think it would be a net loss of readability.

The _litmonth example looks like a candidate for re.VERBOSE and a triple-quoted string, though.

It's taken out of context, just to demonstrate some real world example of how long strings are broken down to handy 80 char code lines. The _litmonth variable is used as component to build other REs and those typically also contain (important) whitespace, so re.VERBOSE won't work. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 14 2013)

...

...
...
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

Mark Dickinson

12:57 p.m.

On Tue, May 14, 2013 at 6:43 PM, M.-A. Lemburg <mal@egenix.com> wrote:

...

...
The _litmonth example looks like a candidate for re.VERBOSE and a triple-quoted string, though.

It's taken out of context, just to demonstrate some real world example of how long strings are broken down to handy 80 char code lines.

The _litmonth variable is used as component to build other REs and those typically also contain (important) whitespace, so re.VERBOSE won't work.

Ah, okay. Makes sense. Thanks, Mark

Jan Kaliszewski

1 p.m.

New subject: Implicit string literal concatenation considered harmful?

14.05.2013 19:24, Mark Dickinson wrote:

...

...
raise errors.DataError( 'Inconsistent revenue item currency: ' 'transaction=%r; transaction_position=%r' % (transaction, transaction_position))

Agreed. I use the implicit concatenation a lot for exception messages like the one above

Me too. But what do you think about: raise errors.DataError( 'Inconsistent revenue item currency: ' c'transaction=%r; transaction_position=%r' % (transaction, transaction_position)) c'...' -- for explicit string (c)ontinuation or (c)oncatenation. Regards. *j

Cameron Simpson

15 May 15 May

9:24 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 14May2013 20:00, Jan Kaliszewski <zuo@chopin.edu.pl> wrote: | 14.05.2013 19:24, Mark Dickinson wrote: | | >> raise errors.DataError( | >> 'Inconsistent revenue item currency: ' | >> 'transaction=%r; transaction_position=%r' % | >> (transaction, transaction_position)) | > | >Agreed. I use the implicit concatenation a lot for exception | >messages like the one above | | Me too. | | But what do you think about: | | raise errors.DataError( | 'Inconsistent revenue item currency: ' | c'transaction=%r; transaction_position=%r' % | (transaction, transaction_position)) | | c'...' -- for explicit string (c)ontinuation or (c)oncatenation. I'm -1 on it myself. I'd expect c'' to act like b'' or u'' or r'': making a "string"-ish thing in a special way. But c'' doesn't; the nearest analog is r'' but c'' goes _backwards_. I much prefer: + 'foo' over c'foo' The former already works and is perfectly clear about what it's doing. The "c" does not do it any better and is easier to miss, visually. Cheers, -- Cameron Simpson <cs@zip.com.au> On the contrary of what you may think, your hacker is fully aware of your company's dress code. He is fully aware of the fact that it doesn't help him to do his job. - Gregory Hosler <gregory.hosler@eno.ericsson.se>

Andrew Barnert

9:54 p.m.

On May 15, 2013, at 19:24, Cameron Simpson <cs@zip.com.au> wrote:

...

I much prefer: + 'foo' over c'foo'

I agree, but this doesn't solve the precedence problem that everyone keeps bringing up. Summarizing (more in hopes that someone will correct me if I've missed something important than to help you or anyone else...): Implicit concatenation is bad because you often use it accidentally when you intended a comma. A rule only allowing implicit concatenation on separate lines doesn't help because both legit and accidental uses are usually on separate lines. There's no way a compiler or linter could help, because there's no programmatic way to distinguish good from bad uses: log("long log message with {} " "and {}", "one arg" "and another") Using + doesn't work because of operator precedence vs. % and .: print("long log message with {} " + "and {}".format("one arg", "and another")) Using an explicit dedent or similar method call doesn't work because the performance is unacceptable. Automatically optimizing the dedent call at compile time doesn't work because sometimes you need it to be at run time. Assuming all of those givens are true, it seems inescapable that either we need some new syntax, or we have to just accept the problem.

Steven D'Aprano

10:24 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 16/05/13 12:54, Andrew Barnert wrote:

...

Summarizing (more in hopes that someone will correct me if I've missed something important than to help you or anyone else...):

Implicit concatenation is bad because you often use it accidentally when you intended a comma.

For some definition of "often". If I've ever made this error, it was so long ago, and so trivially fixed, that I don't remember it.

...

There's no way a compiler or linter could help, because there's no programmatic way to distinguish good from bad uses:

Of course they can *help*. Linters can flag the use of implicit concatenation, and leave it up to the user to decide. That's helping. If you're like me, and use implicit concatenation frequently with few or no problems, then you'll configure the linter to skip the warning. If you're one of the people who rarely or never uses it deliberately, or you work for Google where it goes against their in-house style guide, then you'll tell the linter to treat it as an error. I think that this is the sort of issue that linters are designed to solve. -- Steven

Haoyi Li

10:50 p.m.

To be fair, I've made this mistake twice in the last week. It's trivially fixed once you find it, but a missing , is pretty small and hard to miss! It caused a fair amount of head scratching I don't believe in the "kick it to the linter" solution, since that's basically a non-solution (don't know if it should be good or bad, so let someone else decide!). Until we get some @I_Know_What_Im_Doing decorator so that, in the source code, we can tell the linter to ignore things, it's just going to pop up every time and get in peoples way and add to the lint-spam that accompanies most major projects. Somewhat unrelated, but have any linter managed to solve this issue, whether storing a fine-grained stuff-to-be-ignored list in in-code pragmas or in a separate .linter_ignored file that somehow works while line numbers are constantly changing and such? On Wed, May 15, 2013 at 11:24 PM, Steven D'Aprano <steve@pearwood.info>wrote:

...

On 16/05/13 12:54, Andrew Barnert wrote:

Summarizing (more in hopes that someone will correct me if I've missed

...
something important than to help you or anyone else...):

Implicit concatenation is bad because you often use it accidentally when you intended a comma.

For some definition of "often".

If I've ever made this error, it was so long ago, and so trivially fixed, that I don't remember it.

There's no way a compiler or linter could help, because there's no

...
programmatic way to distinguish good from bad uses:

Of course they can *help*. Linters can flag the use of implicit concatenation, and leave it up to the user to decide. That's helping.

If you're like me, and use implicit concatenation frequently with few or no problems, then you'll configure the linter to skip the warning. If you're one of the people who rarely or never uses it deliberately, or you work for Google where it goes against their in-house style guide, then you'll tell the linter to treat it as an error.

I think that this is the sort of issue that linters are designed to solve.

-- Steven ______________________________**_________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>

alex23

16 May 16 May

12:01 a.m.

On May 16, 1:50 pm, Haoyi Li <haoyi...@gmail.com> wrote:

...

I don't believe in the "kick it to the linter" solution, since that's basically a non-solution (don't know if it should be good or bad, so let someone else decide!).

No, it's a "let the developer decide for themselves whether it's an issue" solution.

...

Until we get some @I_Know_What_Im_Doing decorator so that, in the source code, we can tell the linter to ignore things, it's just going to pop up every time and get in peoples way

http://docs.pylint.org/faq.html#message-control

Andrew Barnert

1:06 a.m.

From: Steven D'Aprano <steve@pearwood.info> To: python-ideas@python.org

...

...
Implicit concatenation is bad because you often use it accidentally when you intended a comma.

For some definition of "often".

Well, yes. But Guido says he makes this mistake often, and others agree with him, and the whole discussion wouldn't have come up if it weren't a problem. So, we're still left with the conclusion:

...

...
There's no way a compiler or linter could help, because there's no

...

programmatic way to distinguish good from bad uses:

Of course they can *help*. Linters can flag the use of implicit concatenation, and leave it up to the user to decide. That's helping.

You're right; let e rephrase. There's no way a compiler could help, and a linter can mitigate but not solve the problem. Which means the conclusion is actually:

...

Assuming all of those givens are true, it seems inescapable that either we need some new syntax, or we have to just accept the problem…

… (with some help from linters). I should also clarify that "accept the problem" could either mean "ban implicit concatenation" (as Guido initially suggested) or "leave implicit concatenation alone", so it's really 3 choices, not 2. Does that sound fair now?

Ethan Furman

20 May 20 May

12:22 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 05/15/2013 11:06 PM, Andrew Barnert wrote:

...

From: Steven D'Aprano

...
Andrew Barnert wrote:

...
Implicit concatenation is bad because you often use it accidentally when

you intended a comma.

For some definition of "often".

Well, yes. But Guido says he makes this mistake often, and others agree with him, and the whole discussion wouldn't have come up if it weren't a problem. So, we're still left with the conclusion:

Actually, Guido said:

...

This is a fairly common mistake [...]

Which I understood to mean, "we all make this mistake," not necessarily that we all make this mistake often. -- ~Ethan~

Andrew Barnert

21 May 21 May

10:58 a.m.

On May 20, 2013, at 10:22, Ethan Furman <ethan@stoneleaf.us> wrote:

...

On 05/15/2013 11:06 PM, Andrew Barnert wrote:

...
From: Steven D'Aprano

...
Andrew Barnert wrote:

...
Implicit concatenation is bad because you often use it accidentally when

you intended a comma.

For some definition of "often".

Well, yes. But Guido says he makes this mistake often, and others agree with him, and the whole discussion wouldn't have come up if it weren't a problem. So, we're still left with the conclusion:

Actually, Guido said:

...
This is a fairly common mistake [...]

Which I understood to mean, "we all make this mistake," not necessarily that we all make this mistake often.

If your point is that Guido didn't think we make the mistake often enough that it's a problem worth solving, that's clearly not true, or he wouldn't have suggested changing the language. If you just want to rewrite the summary as "Implicit concatenation is bad because you use it accidentally when you intended a comma often enough to cause problems" instead of just "often", fine. But how does that change anything meaningful?

Ethan Furman

20 May 20 May

12:13 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 05/15/2013 07:54 PM, Andrew Barnert wrote:

...

Implicit concatenation is bad because you often use it accidentally when you intended a comma.

I don't think anybody has said they get bit often, just that's it can be painful when they do. I forget the comma once or twice a year -- I'm willing to pay that bit of pain for the convenience. So, yeah, I'm reversing my vote to -1 unless something equally simple and easy on the eyes is developed. -- ~Ethan~

Antoine Pitrou

14 May 14 May

1:05 p.m.

On Sat, 11 May 2013 19:24:02 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:

...

This is not a good solution for long lines where you don't want to have embedded line endings. Taken from existing code:

_litmonth = ('(?P<litmonth>' 'jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec|' 'mär|mae|mrz|mai|okt|dez|' 'fev|avr|juin|juil|aou|aoû|déc|' 'ene|abr|ago|dic|' 'out' ')[a-z,\.;]*')

For the record, I know this isn't the point of your message, but you're probably missing 'fév' (accented) above :-) Regards Antoine.

Raymond Hettinger

10 May 10 May

11:04 p.m.

On May 10, 2013, at 11:48 AM, Guido van Rossum <guido@python.org> wrote:

...

Would it be reasonable to start deprecating this and eventually remove it from the language?

I don't think it would be missed. I very rarely see it used in practice. Raymond

Philip Jenvey

11:20 p.m.

On May 10, 2013, at 9:04 PM, Raymond Hettinger wrote:

...

On May 10, 2013, at 11:48 AM, Guido van Rossum <guido@python.org> wrote:

...
Would it be reasonable to start deprecating this and eventually remove it from the language?

I don't think it would be missed. I very rarely see it used in practice.

Really? I see it used over multiple lines all over the place. -- Philip Jenvey

Steven D'Aprano

11 May 11 May

12:36 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 11/05/13 04:48, Guido van Rossum wrote:

...

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

Not unless you guarantee that compile-time folding of string literals with '+' will be a language feature rather than an optional optimization. I frequently use implicit string concatenation for long strings, or to keep within the 80 character line limit. I teach people to prefer it over '+' because: - constant folding is an implementation detail that is not guaranteed, and not all versions of Python support; - even when provided, constant folding is an optimization which might not be present in the future[1]; - implicit string concatenation is a language feature, so every Python must support it; - and is nicer than the current alternatives involving backslashes or triple-quoted strings. The problems caused by implicit string concatenation are uncommon and mild. Having two string literals immediately next to each other is uncommon; forgetting the comma makes it rarer. So I think the benefit of implicit string concatenation far outweighs the occasional problem. [1] I recall you (GvR) publicly complaining about CPython optimizations and suggesting that they are more effort than they are worth and should be dropped. I don't recall whether you explicitly included constant folding in that. -- Steven

Random832

12:44 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 05/11/2013 01:36 AM, Steven D'Aprano wrote:

...

Not unless you guarantee that compile-time folding of string literals with '+' will be a language feature rather than an optional optimization.

What makes you think that implicit concatenation being compile-time isn't optional?

Steven D'Aprano

1 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 11/05/13 15:44, Random832 wrote:

...

On 05/11/2013 01:36 AM, Steven D'Aprano wrote:

...
Not unless you guarantee that compile-time folding of string literals with '+' will be a language feature rather than an optional optimization.

What makes you think that implicit concatenation being compile-time isn't optional?

http://docs.python.org/3/reference/lexical_analysis.html#string-literal-conc... In the sense that there is no ISO standard for Python, *everything* is optional if Guido decrees that it is. But compile-time implicit concatenation is a documented language feature, not an implementation-dependent optimization. -- Steven

João Bernardo

10:03 a.m.

...

Would it be reasonable to start deprecating this and eventually remove it from the language?

-1... I find it very useful and clean for multiple lines and actually

don't remember having bugs because of it. It could be deprecated/removed when used on a single line though.

Kabie

11:09 a.m.

+1 for this I don't understand why would anyone wants to use this on a single line of text. But please keep it for multiple lines usage. 2013/5/11 João Bernardo <jbvsmo@gmail.com>

...

It could be deprecated/removed when used on a single line though.

Juancarlo Añez

11:39 a.m.

After reading about other people's use-cases, I' now: -1 I think that all that's required for solving Guido's original use case is a new warning in pylint, pep8, or flake. PEP8 could be updated to discourage the use of automatic concatenation in those places. The warning would apply only to automatic concatenations within parameter passing and structures, and not to assignments or formatting through %. Doing it this way would solve the use case by declaring certain uses of automatic concatenation a "code smell", and automating detection of the bad uses, without any changes to the language. All that Guido needs to do is change PEP8, and wait for the static analyzers to follow. Cheers, -- Juancarlo *Añez*

Ethan Furman

12 May 12 May

7:06 a.m.

New subject: Implicit string literal concatenation considered harmful?

Wow. Judging from the size of this thread one might think you had suggested enumerating the string literals. ;) -- ~Ethan~

Markus Unterwaditzer

16 May 16 May

12:18 a.m.

New subject: Implicit string literal concatenation considered harmful?

Guido van Rossum <guido@python.org> wrote:

...

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

Not sure why nobody mentioned it yet, maybe it's obviously not helping in this situation, but... What if such multi-line strings have to have their own set of parens around them? Valid: do_foo( ("foo" "bar"), "baz" ) Invalid: do_foo( "foo" "bar", "baz" ) -- Markus (from phone)

Andrew Barnert

12:54 a.m.

From: Markus Unterwaditzer <markus@unterwaditzer.net> Sent: Wednesday, May 15, 2013 10:18 PM

...

Not sure why nobody mentioned it yet, maybe it's obviously not helping in this situation, but...

What if such multi-line strings have to have their own set of parens around them?

Valid: do_foo( ("foo" "bar"), "baz" )

Invalid: do_foo( "foo" "bar", "baz" )

As I understand it, the main reason people didn't like Guido's suggestion of "just use +" was that (because of operator precedence) they'd sometimes have to add parentheses that are unnecessary today. So, I'm betting it will be just as unpopular with the same people. Personally, I don't dislike it. But then I don't dislike the "just use +" answer either.

Bruce Leban

1:11 a.m.

On May 15, 2013 10:57 PM, "Andrew Barnert" <abarnert@yahoo.com> wrote:

...

As I understand it, the main reason people didn't like Guido's suggestion of "just use +" was that (because of operator precedence) they'd sometimes have to add parentheses that are unnecessary today. So, I'm betting it will be just as unpopular with the same people.

The difference between requiring parens around implicit concatenation and around uses of + is that leaving the parens out in the first case would be a syntax error and do the wrong thing in the second case. --- Bruce (from my phone)

Serhiy Storchaka

2:08 a.m.

10.05.13 21:48, Guido van Rossum написав(ла):

...

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

As was said before the '+' operator has less priority than the '%' operator and an attribute access, i.e. it requires parenthesis in some cases. However parenthesis introduce a noise and can cause other types of errors. In all cases only multiline implicit string literal concatenation cause problem. What if forbid implicit string literal concatenation only between string literals on different physical lines? A deliberate string literal concatenation can be made with explicit line joining. raise ValueError('Type names and field names must be valid '\ 'identifiers: %r' % name) raise ValueError('{} not bottom-level directory in '\ '{!r}'.format(_PYCACHE, path)) ignore_patterns = ( 'Function "%s" not defined.' % breakpoint, "warning: no loadable sections found in added symbol-file"\ " system-supplied DSO", "warning: Unable to find libthread_db matching"\ " inferior's thread library, thread debugging will"\ " not be available.", "warning: Cannot initialize thread debugging"\ " library: Debugger service failed", 'warning: Could not load shared library symbols for '\ 'linux-vdso.so', 'warning: Could not load shared library symbols for '\ 'linux-gate.so', 'Do you need "set solib-search-path" or '\ '"set sysroot"?', ) I think this introduces less noise than the '+' operator or other proposed alternatives.

MRAB

9:40 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 16/05/2013 08:08, Serhiy Storchaka wrote:

...

10.05.13 21:48, Guido van Rossum написав(ла):

...
I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

As was said before the '+' operator has less priority than the '%' operator and an attribute access, i.e. it requires parenthesis in some cases. However parenthesis introduce a noise and can cause other types of errors.

[snip] I wonder whether we could use ".". Or would that be too confusing?

...

In all cases only multiline implicit string literal concatenation cause problem. What if forbid implicit string literal concatenation only between string literals on different physical lines? A deliberate string literal concatenation can be made with explicit line joining.

raise ValueError('Type names and field names must be valid '\ 'identifiers: %r' % name)

raise ValueError('Type names and field names must be valid ' . 'identifiers: %r' % name)

...

raise ValueError('{} not bottom-level directory in '\ '{!r}'.format(_PYCACHE, path))

raise ValueError('{} not bottom-level directory in ' . '{!r}'.format(_PYCACHE, path))

Stefan Drees

9:43 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 16.05.13 16:40, MRAB wrote:

...

On 16/05/2013 08:08, Serhiy Storchaka wrote:

...
10.05.13 21:48, Guido van Rossum написав(ла):

...
[snip + snip]

[snip] I wonder whether we could use ".". Or would that be too confusing? ...

that is interesting (with respect to php -> python porting :-) I will take a seat and wait for the thread to evolve based on your dot. All the best, Stefan.

Chris Angelico

9:44 a.m.

On Fri, May 17, 2013 at 12:40 AM, MRAB <python@mrabarnett.plus.com> wrote:

...

On 16/05/2013 08:08, Serhiy Storchaka wrote:

...
As was said before the '+' operator has less priority than the '%' operator and an attribute access, i.e. it requires parenthesis in some cases. However parenthesis introduce a noise and can cause other types of errors.

I wonder whether we could use ".". Or would that be too confusing?

And I apologized for borrowing an idea from bash. Taking an idea from PHP?!? Seriously, I don't think another operator is needed. If it's not going to be the implicit concatenation by abuttal, + or \ will carry the matter. But I share the opinion of several here: implicit concatenation is not as bad as the alternatives. ChrisA

MRAB

10 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 16/05/2013 15:44, Chris Angelico wrote:

...

On Fri, May 17, 2013 at 12:40 AM, MRAB <python@mrabarnett.plus.com> wrote:

...
On 16/05/2013 08:08, Serhiy Storchaka wrote:

...
As was said before the '+' operator has less priority than the '%' operator and an attribute access, i.e. it requires parenthesis in some cases. However parenthesis introduce a noise and can cause other types of errors.

I wonder whether we could use ".". Or would that be too confusing?

And I apologized for borrowing an idea from bash. Taking an idea from PHP?!?

It has high precendence as far as the parser is concerned. I know that Perl uses it. I haven't looked at PHP (I hear bad things about it! :-)).

...

Seriously, I don't think another operator is needed. If it's not going to be the implicit concatenation by abuttal, + or \ will carry the matter. But I share the opinion of several here: implicit concatenation is not as bad as the alternatives.

It wouldn't be an operator as such.

Andrew Barnert

10:57 a.m.

On May 16, 2013, at 8:00, MRAB <python@mrabarnett.plus.com> wrote:

...

On 16/05/2013 15:44, Chris Angelico wrote:

...
On Fri, May 17, 2013 at 12:40 AM, MRAB <python@mrabarnett.plus.com> wrote:

...
On 16/05/2013 08:08, Serhiy Storchaka wrote:

...
As was said before the '+' operator has less priority than the '%' operator and an attribute access, i.e. it requires parenthesis in some cases. However parenthesis introduce a noise and can cause other types of errors. I wonder whether we could use ".". Or would that be too confusing?

...

...
And I apologized for borrowing an idea from bash. Taking an idea from PHP?!? It has high precendence as far as the parser is concerned.

I know that Perl uses it. I haven't looked at PHP (I hear bad things about it! :-)).

...
Seriously, I don't think another operator is needed. If it's not going to be the implicit concatenation by abuttal, + or \ will carry the matter. But I share the opinion of several here: implicit concatenation is not as bad as the alternatives. It wouldn't be an operator as such

Of course in php, perl, and every other language that uses dot for string concatenation, it _is_ an operator, so this will end up confusing the very people who initially find it comforting. And this means the parser has to figure out whether you mean dot for attribute access or dot for concatenation. That's not exactly a _hard_ problem, but it's not _trivial_. And then there's the fact that the "precedence" is different depending on which meaning the dot gets. Remember that what you're trying to solve is the problem that member-dot and % both have higher precedence than +.

MRAB

11:23 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 16/05/2013 16:57, Andrew Barnert wrote:

...

On May 16, 2013, at 8:00, MRAB <python@mrabarnett.plus.com> wrote:

...
On 16/05/2013 15:44, Chris Angelico wrote:

...
On Fri, May 17, 2013 at 12:40 AM, MRAB <python@mrabarnett.plus.com> wrote:

...
On 16/05/2013 08:08, Serhiy Storchaka wrote:

...
As was said before the '+' operator has less priority than the '%' operator and an attribute access, i.e. it requires parenthesis in some cases. However parenthesis introduce a noise and can cause other types of errors. I wonder whether we could use ".". Or would that be too confusing?

...
...
And I apologized for borrowing an idea from bash. Taking an idea from PHP?!? It has high precendence as far as the parser is concerned.

I know that Perl uses it. I haven't looked at PHP (I hear bad things about it! :-)).

...
Seriously, I don't think another operator is needed. If it's not going to be the implicit concatenation by abuttal, + or \ will carry the matter. But I share the opinion of several here: implicit concatenation is not as bad as the alternatives. It wouldn't be an operator as such

Of course in php, perl, and every other language that uses dot for string concatenation, it _is_ an operator, so this will end up confusing the very people who initially find it comforting.

And this means the parser has to figure out whether you mean dot for attribute access or dot for concatenation. That's not exactly a _hard_ problem, but it's not _trivial_.

And then there's the fact that the "precedence" is different depending on which meaning the dot gets. Remember that what you're trying to solve is the problem that member-dot and % both have higher precedence than +.

I thought the problem we were trying to solve was that "+" has a lower precedence than "%" and attribute/method access, so implicit concatenation that's followed by "%" or ".format" can't be replaced by "+" without adding extra parentheses.

Christian Tismer

11:57 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 16.05.13 18:23, MRAB wrote:

...

On 16/05/2013 16:57, Andrew Barnert wrote:

...
On May 16, 2013, at 8:00, MRAB <python@mrabarnett.plus.com> wrote:

...
On 16/05/2013 15:44, Chris Angelico wrote:

...
On Fri, May 17, 2013 at 12:40 AM, MRAB <python@mrabarnett.plus.com> wrote:

...
On 16/05/2013 08:08, Serhiy Storchaka wrote:

...
As was said before the '+' operator has less priority than the '%' operator and an attribute access, i.e. it requires parenthesis in some cases. However parenthesis introduce a noise and can cause other types of errors. I wonder whether we could use ".". Or would that be too confusing?

...
...
And I apologized for borrowing an idea from bash. Taking an idea from PHP?!? It has high precendence as far as the parser is concerned.

I know that Perl uses it. I haven't looked at PHP (I hear bad things about it! :-)).

...
Seriously, I don't think another operator is needed. If it's not going to be the implicit concatenation by abuttal, + or \ will carry the matter. But I share the opinion of several here: implicit concatenation is not as bad as the alternatives. It wouldn't be an operator as such

Of course in php, perl, and every other language that uses dot for string concatenation, it _is_ an operator, so this will end up confusing the very people who initially find it comforting.

And this means the parser has to figure out whether you mean dot for attribute access or dot for concatenation. That's not exactly a _hard_ problem, but it's not _trivial_.

And then there's the fact that the "precedence" is different depending on which meaning the dot gets. Remember that what you're trying to solve is the problem that member-dot and % both have higher precedence than +.

I thought the problem we were trying to solve was that "+" has a lower precedence than "%" and attribute/method access, so implicit concatenation that's followed by "%" or ".format" can't be replaced by "+" without adding extra parentheses.

I think the "." is a nice idea at first sight, but might become confusing in the end because what we actually need is a simple to use notation for the scanner/parser that denotes a continuation line, and _not_ an operator. Now, what about this? long_line = "the beginning and the"& # comments are ok " continuation of a string" The "&" is not a valid operator on strings and looks pretty much like gluing parts together. It is better than the "\" that just escapes the newline and cannot take comments. I would even enforce that the ampersand be on the same line. cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Bruce Leban

12:26 p.m.

On Thu, May 16, 2013 at 9:57 AM, Christian Tismer <tismer@stackless.com>wrote:

...

The "&" is not a valid operator on strings and looks pretty much like gluing parts together. It is better than the "\" that just escapes the newline and cannot take comments.

I don't like something that is a standard operator becoming special syntax. While it's true that string & string is not valid, it's not the case that string & ... is not valid. I dislike dot for the same reason. It's confusing that these would do different things: 'abc' & 'def' ('abc') & 'def' I like the \ idea because it's clearly syntax and not an operator, but the fact that it doesn't work with comments is annoying since one reason to break a string is to insert comments. I don't like that spaces after the \ are not allowed because trailing spaces are invisible to me but not to the parser. So what if the rule for trailing \ was changed to: The \ continuation character may be followed by white space and a comment. If a comment is present, there must be at least one whitespace character between the \ and the comment. That is: x = [ # THIS WOULD BE ALLOWED 'abc' \ 'def' \ # not the python keyword 'ghi' ] x = [ # THIS WOULD BE AN ERROR 'abc' \ 'def' # a comment but no continuation \ 'ghi' ] One thing I like about using \ is that it already works (aside from my proposed comment change). So anyone wanting to write forward/backward-compatible code can just add the \s now. If you want to start enforcing the restriction, just use from __future__ import explicit_string_continuation. --- Bruce Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security

MRAB

12:38 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 16/05/2013 18:26, Bruce Leban wrote:

...

On Thu, May 16, 2013 at 9:57 AM, Christian Tismer <tismer@stackless.com <mailto:tismer@stackless.com>> wrote:

The "&" is not a valid operator on strings and looks pretty much like gluing parts together. It is better than the "\" that just escapes the newline and cannot take comments.

I don't like something that is a standard operator becoming special syntax. While it's true that string & string is not valid, it's not the case that string & ... is not valid. I dislike dot for the same reason. It's confusing that these would do different things:

'abc' & 'def' ('abc') & 'def'

I like the \ idea because it's clearly syntax and not an operator, but the fact that it doesn't work with comments is annoying since one reason to break a string is to insert comments. I don't like that spaces after the \ are not allowed because trailing spaces are invisible to me but not to the parser. So what if the rule for trailing \ was changed to:

The \ continuation character may be followed by white space and a comment. If a comment is present, there must be at least one whitespace character between the \ and the comment.

Why do you say """there must be at least one whitespace character between the \ and the comment"""?

...

That is:

x = [ # THIS WOULD BE ALLOWED 'abc' \ 'def' \ # not the python keyword 'ghi' ]

x = [ # THIS WOULD BE AN ERROR 'abc' \ 'def' # a comment but no continuation \ 'ghi' ]

One thing I like about using \ is that it already works (aside from my proposed comment change). So anyone wanting to write forward/backward-compatible code can just add the \s now. If you want to start enforcing the restriction, just use from __future__ import explicit_string_continuation.

Bruce Leban

1:14 p.m.

On Thu, May 16, 2013 at 10:38 AM, MRAB <python@mrabarnett.plus.com> wrote:

...

Why do you say """there must be at least one whitespace character between the \ and the comment"""?

Two reasons: (1) make the backslash more likely to stand out visually (and we can't require a space before it) (2) \# looks like it might be an escape sequence of some sort while I don't think \ # does, making this friendlier to readers. I'm not passionate about that detail if the rest of the proposal flies. --- Bruce Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security

Chris Angelico

1:20 p.m.

On Fri, May 17, 2013 at 4:14 AM, Bruce Leban <bruce@leapyear.org> wrote:

...

On Thu, May 16, 2013 at 10:38 AM, MRAB <python@mrabarnett.plus.com> wrote:

...
Why do you say """there must be at least one whitespace character between the \ and the comment"""?

Two reasons:

(1) make the backslash more likely to stand out visually (and we can't require a space before it)

(2) \# looks like it might be an escape sequence of some sort while I don't think \ # does, making this friendlier to readers.

I'm not passionate about that detail if the rest of the proposal flies.

Spin that off as a separate thread, I think the change to the backslash rules stands alone. I would support it; allowing a line-continuation backslash to be followed by a comment is a Good Thing imo. ChrisA

Christian Tismer

17 May 17 May

4:32 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 16.05.13 20:20, Chris Angelico wrote:

...

...
On Thu, May 16, 2013 at 10:38 AM, MRAB <python@mrabarnett.plus.com> wrote:

...
Why do you say """there must be at least one whitespace character between the \ and the comment"""?

Two reasons:

(1) make the backslash more likely to stand out visually (and we can't require a space before it)

(2) \# looks like it might be an escape sequence of some sort while I don't think \ # does, making this friendlier to readers.

I'm not passionate about that detail if the rest of the proposal flies. Spin that off as a separate thread, I think the change to the backslash rules stands alone. I would support it; allowing a

On Fri, May 17, 2013 at 4:14 AM, Bruce Leban <bruce@leapyear.org> wrote: line-continuation backslash to be followed by a comment is a Good Thing imo.

I don't think these matters should be discussed in separate threads. We came from Guido's proposal to remove implicit string concatenation. In that context, some people argued that there should be no new ".", "&" or whatever operator rules, but better handling of the unbeloved backslash. I think both can and should be treated together. Doing so, I come to repeat this proposal: - implicit string concatenation becomes deprecated - the backslash will allow comments, as proposed by Bruce - continuation of a string on the next line will later enforce the backslash. So repeating Bruce's example, the following would be allowed: x = [ # THIS WOULD BE ALLOWED 'abc' \ 'def' \ # not the python keyword 'ghi' ] And this would be an error: x = [ # THIS WOULD BE AN ERROR 'abc' \ 'def' # a comment but no continuation \ 'ghi' ] '\' would become kind of a line glue operator that becomes needed to merge the strings. I don't think that parentheses are superior for that. Parentheses are for expressions and they suggest expressions. Avoiding parentheses where they don't group parts of expressions is imo a good thing. The reason why Python has grown the recommendation to use parentheses comes more from the absence of a good alternative. cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Steven D'Aprano

6:41 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 17/05/13 19:32, Christian Tismer wrote:

...

On 16.05.13 20:20, Chris Angelico wrote:

...
On Fri, May 17, 2013 at 4:14 AM, Bruce Leban <bruce@leapyear.org> wrote:

...

...
...
I'm not passionate about that detail if the rest of the proposal flies.

Spin that off as a separate thread, I think the change to the backslash rules stands alone. I would support it; allowing a line-continuation backslash to be followed by a comment is a Good Thing imo.

I don't think these matters should be discussed in separate threads.

They clearly should be in different threads. Line continuation is orthogonal to string continuation. You can have string concatenation on a single line: s = "Label:\t" r"Data containing \ backslashes" And you can have line continuations not involving strings: result = math.sin(23*theta) + cos(17*theta) - \ sin(3*theta**2)*cos(5*theta**3) Since the two things under discussion are independent, they should be discussed in different threads.

...

- implicit string concatenation becomes deprecated

-1 Implicit string concatenation is useful, and used by many people without problems.

...

- the backslash will allow comments, as proposed by Bruce

+0 It's not really that important these days. If you want comments, use brackets to group a multi-line expression.

...

- continuation of a string on the next line will later enforce the backslash.

I don't understand what this sentence means.

...

So repeating Bruce's example, the following would be allowed:

x = [ # THIS WOULD BE ALLOWED 'abc' \ 'def' \ # not the python keyword 'ghi' ]

The backslashes are redundant, since the square brackets already enable a multi-line expression.

...

And this would be an error:

x = [ # THIS WOULD BE AN ERROR 'abc' \ 'def' # a comment but no continuation \ 'ghi' ]

'\' would become kind of a line glue operator that becomes needed to merge the strings.

-1 since there are uses for concatenating strings on a single line.

...

I don't think that parentheses are superior for that. Parentheses are for expressions and they suggest expressions. Avoiding parentheses where they don't group parts of expressions is imo a good thing.

I don't understand this objection, since the parentheses are being used to group an expression. And they are being used to group expressions.

...

The reason why Python has grown the recommendation to use parentheses comes more from the absence of a good alternative.

Maybe so, but now that we have multi-line expressions inside brackets, the need for an alternative is much reduced. -- Steven

Ron Adam

9:14 a.m.

On 05/17/2013 06:41 AM, Steven D'Aprano wrote:

...

On 17/05/13 19:32, Christian Tismer wrote:

...
On 16.05.13 20:20, Chris Angelico wrote:

...
On Fri, May 17, 2013 at 4:14 AM, Bruce Leban <bruce@leapyear.org> wrote:

...
...
...
I'm not passionate about that detail if the rest of the proposal flies.

Spin that off as a separate thread, I think the change to the backslash rules stands alone. I would support it; allowing a line-continuation backslash to be followed by a comment is a Good Thing imo.

I don't think these matters should be discussed in separate threads.

They clearly should be in different threads. Line continuation is orthogonal to string continuation. You can have string concatenation on a single line:

s = "Label:\t" r"Data containing \ backslashes"

Can you think of, or find an example of two adjacent strings on the same line that can't be written as a single string? s = "Label:\t Data containing \ backslashes" I'm curious about how much of a problem not having implicit string concatenations really is?

...

And you can have line continuations not involving strings:

result = math.sin(23*theta) + cos(17*theta) - \ sin(3*theta**2)*cos(5*theta**3)

Since the two things under discussion are independent, they should be discussed in different threads.

...
- implicit string concatenation becomes deprecated

-1

Implicit string concatenation is useful, and used by many people without problems.

This is why they are trying to find an explicit alternative.

...

...
- the backslash will allow comments, as proposed by Bruce

+0

It's not really that important these days. If you want comments, use brackets to group a multi-line expression.

...
- continuation of a string on the next line will later enforce the backslash.

I don't understand what this sentence means.

...
So repeating Bruce's example, the following would be allowed:

x = [ # THIS WOULD BE ALLOWED 'abc' \ 'def' \ # not the python keyword 'ghi' ]

The backslashes are redundant, since the square brackets already enable a multi-line expression.

But it is also a source of errors which appears to happen often enough, or is annoying enough, to be worth changing. Guido's example was a situation where a comma was left out and two strings were joined inside a list without an error message. If you accidentally put a comma in a multi line expression inside parentheses, it becomes a tuple without an error message.

...

...
...
('abc' ... 'def', ... 'ghi') ('abcdef', 'ghi')

By removing implicit string concatenations, an error can be raised in some of these situations. The fact that these errors are silent and may not be noticed until a programs actually used is an important part of this. Or even worse, not noticed at all!

...

...
'\' would become kind of a line glue operator that becomes needed to merge the strings.

-1 since there are uses for concatenating strings on a single line.

Guido's suggestion is just to live with using a '+'. His point was that any extra overhead wouldn't be that harmful as literal string concatenations tend to be in initiation parts of programs. But the + has a lower precedence than %. Which is inconvenient.

...

...
I don't think that parentheses are superior for that. Parentheses are for expressions and they suggest expressions. Avoiding parentheses where they don't group parts of expressions is imo a good thing.

I agree with this. Especially if the expression being grouped has parentheses inside it.

...

I don't understand this objection, since the parentheses are being used to group an expression. And they are being used to group expressions.

It's also a matter of reducing errors. I think it improves readability as well. Which also reduces errors.

...

...
The reason why Python has grown the recommendation to use parentheses comes more from the absence of a good alternative.

Maybe so, but now that we have multi-line expressions inside brackets, the need for an alternative is much reduced.

If you use braces, you get a one item list as the result. Parentheses are used to change an expressions order of evaluation as well. Ron

rurpy＠yahoo.com

4:41 p.m.

New subject: Implicit string literal concatenation considered harmful?

On Friday, May 17, 2013 8:14:39 AM UTC-6, Ron Adam wrote:

...

On 05/17/2013 06:41 AM, Steven D'Aprano wrote:

...
They clearly should be in different threads. Line continuation is orthogonal to string continuation. You can have string concatenation on a single line:

s = "Label:\t" r"Data containing \ backslashes"

Can you think of, or find an example of two adjacent strings on the same line that can't be written as a single string?

s = "Label:\t Data containing \ backslashes"

I'm curious about how much of a problem not having implicit string concatenations really is?

"Can't" is an unrealistically high a bar but I posted a real example at http://mail.python.org/pipermail/python-ideas/2013-May/020847.html that is *better* written IMO as adjacently-concatenated string literals.

Ron Adam

18 May 18 May

1:16 p.m.

New subject: Implicit string literal concatenation considered harmful (options)

On 05/17/2013 04:41 PM, rurpy@yahoo.com wrote:

...

On Friday, May 17, 2013 8:14:39 AM UTC-6, Ron Adam wrote:

On 05/17/2013 06:41 AM, Steven D'Aprano wrote: > They clearly should be in different threads. Line continuation is > orthogonal to string continuation. You can have string concatenation on a > single line: > > s = "Label:\t" r"Data containing \ backslashes"

Can you think of, or find an example of two adjacent strings on the same line that can't be written as a single string?

s = "Label:\t Data containing \ backslashes"

I'm curious about how much of a problem not having implicit string concatenations really is?

"Can't" is an unrealistically high a bar but I posted a real example at http://mail.python.org/pipermail/python-ideas/2013-May/020847.html that is *better* written IMO as adjacently-concatenated string literals.

If we didn't have implicit string concatenation, I'd probably write it with each part on a separate line to make it easier to read. pattern = '[^\uFF1B\u30FB\u3001' \ + r'+:=.,\/\[\]\t\r\n]+' \ + '[\#\uFF03]+' I think in this case the strings are joined at compile time as Guido suggested in is post. You could also write it as... pattern = ('[^\uFF1B\u30FB\u3001' + r'+:=.,\/\[\]\t\r\n]+' + '[\#\uFF03]+') If implicit string concatenation is removed, it would be nice if there was an explicit replacement for it. There is a strong consensus for doing it, but there isn't strong consensus on how to do it. About line continuations: Line continuations are a related issue to string concatenations because they are used together fairly often. The line continuation behaviour is a bit quarky, but not in any critical way. There has even been a PEP to remove it in python 3, but it was rejected for not having enough support. People do use it, so it would be better if it was improved rather than removed. As noted in other messages, the line continuation is copied from C, which I think originally came from the 'Make' utility. (I'm not positive on that) In C and Make, the \+newline pair is replaced with a space. Python just removes both the \+newline and keeps track of weather or not it's in a string. Look in tokenize.c for this. As for the *not too important* quarkyness:

...

...
...
'abc' \ 'efg' File "<stdin>", line 1 'abc' \ 'efg' ^ SyntaxError: unexpected character after line continuation character

This error implies that the '\' by it self is a line continuation token even though it's not followed by a newline. Other wise you would get the same SyntaxError you get when you use any other symbol in an invalid way. This was probably done either because it was easy to do, and/or because a better error message is more helpful. Trailing white space results in the same error. This happens enough to be annoying. It is confusing to some people why the compiler can recognise the line continuation *character*, but can't figure out that the white space after it is not important.

...

...
...
# comment 1\ ... comment 2 File "<stdin>", line 2 comment 2 ^ SyntaxError: invalid syntax

This just shows that comments are parsed before line continuations are considered. Or to put it another way.. the '\' is part of the comment. That isn't the case in C or Make. You can continue a comment on the next line with a line continuation. Nothing wrong with this, but it shows the line continuations in Python aren't exact copies of the line continuation in C. There are perfectly good reasons why the compiler does what it does in each of these cases. I think the little things like this together has contributed to the feeling that line continuations are bad and should be avoided. The discussed (and implied) options: There are a number of options that have been discussed but those haven't really been clearly spelled out so the discussion has been kind of out of focus. This seems like an overly detailed list, but the discussion has touched on pretty much all of these things. I think the goal should be to find the most cohesive combination for Python 4 and/or just go with B alone. A. Do nothing. B. Remove implicit concatenation. (We could stop here, anything after this can be done later.) C. Remove Explicit line continuations. (See options below.) D. Add a new explicit string concatenation token. E. Reuse the \ as an explicit string concatenation. (with C) F. Make an exception for implicit string concatenations only after a line continuation. (with B) G. Make an exception for line continuations if a line ends with a explicit string concatenation. (With C and (D or E)) H. Change line concatenation character from \+newline to just \. I. Allow implicit line continuations if a line ends with a operator that expects to be continued, like a comma inside parentheses already does. (With C) Option H has some interesting possibilities. It pretty much is a complete replacement for the current escaped newline continuation, so how it works, and what constraints it has, would need to be discussed. It's the option that would allow white space and comments after a line continuation character. Option I is interesting because it's already there inside of parentheses, and other containers. It's just haven't seen it described as an implicit line continuation before. It is my feeling that we can't change the escaped newline within strings. That need to be how it is, and it should be documented as a string feature, rather than a general line continuation token. So if line continuations outside of strings is removed, escaped newlines inside of strings will still work. There are so many possibilities here, that the only thing I'm sure of right now is to go ahead and start the process of removing implicit string concatenations (Option B), and then consider everything else as separate issues in that context. Cheers, Ron

Steven D'Aprano

3:58 p.m.

New subject: Implicit string literal concatenation considered harmful (options)

On 19/05/13 04:16, Ron Adam wrote:

...

If implicit string concatenation is removed, it would be nice if there was an explicit replacement for it. There is a strong consensus for doing it,

I don't think there is. From what I have seen, there have been nearly as many people objecting to the proposed removal as there have been people supporting it, it is only that some of the people supporting the removal are more vocal, proposing alternative after alternative, none of which are particularly nice. Single dot, ellipsis, yet another string prefix c'', forced backslashes, ampersand... Have I missed any?

...

but there isn't strong consensus on how to do it.

...

About line continuations:

Line continuations are a related issue to string concatenations because they are used together fairly often.

They might be related, but they are orthogonal. We could change one, or the other, or both, or neither. There are virtues to changing the behaviour of \ line concatenation independent of any changes made to strings.

...

The line continuation behaviour is a bit quarky,

Do you mean "quirky"? Quarky would mean "like quark(s)", which could refer to something being like a type of German cream cheese, or possibly like fundamental subatomic particles that make up protons and neutrons.

...

There are a number of options that have been discussed but those haven't really been clearly spelled out so the discussion has been kind of out of focus. This seems like an overly detailed list, but the discussion has touched on pretty much all of these things. I think the goal should be to find the most cohesive combination for Python 4 and/or just go with B alone.

A. Do nothing.

B. Remove implicit concatenation.

(We could stop here, anything after this can be done later.)

We can't just "remove implicit concatenation", because that will break code which is currently working perfectly. And probably it will break more working code than it will fix unnoticed broken code. So removal requires a deprecation schedule: deprecate for at least one release. The conservative approach is: * mark as deprecated in the docs in 3.4; * raise a deprecated warning in 3.5; * remove in 3.6. or even later. Any removal of functionality leads to code churn: people will be forced to change code that works now because it will stop working in the future. That's a serious cost even when there are clear and obvious benefits to the removal. -- Steven

Ron Adam

5:20 p.m.

New subject: Implicit string literal concatenation considered harmful (options)

On 05/18/2013 03:58 PM, Steven D'Aprano wrote:

...

On 19/05/13 04:16, Ron Adam wrote:

...
If implicit string concatenation is removed, it would be nice if there was an explicit replacement for it. There is a strong consensus for doing it,

I don't think there is. From what I have seen, there have been nearly as many people objecting to the proposed removal as there have been people supporting it, ...

Correct, there isn't a very strong consensus for the removal. But the discussion has been focused more on a replacement than on a eventual future removal down the road. If it was to be removed (as I said), there is a strong consensus for some sort of an explicit variation to replace it. But there isn't any agreement on how to do that. The discussion is split between not removing it and removing it with some sort of replacement. We need to know how many people are ok with removing it even if a replacement is not found. (It doesn't mean one won't be found.) it is only that some of the people supporting the removal

...

are more vocal, proposing alternative after alternative, none of which are particularly nice. Single dot, ellipsis, yet another string prefix c'', forced backslashes, ampersand... Have I missed any?

...

...
but there isn't strong consensus on how to do it.

...
About line continuations:

Line continuations are a related issue to string concatenations because they are used together fairly often.

They might be related, but they are orthogonal. We could change one, or the other, or both, or neither. There are virtues to changing the behaviour of \ line concatenation independent of any changes made to strings.

I agree.

...

...
The line continuation behaviour is a bit quarky,

Do you mean "quirky"? Quarky would mean "like quark(s)", which could refer to something being like a type of German cream cheese, or possibly like fundamental subatomic particles that make up protons and neutrons.

LOL.. Yes quirky. Definitely not the cheese. ;-)

...

...
There are a number of options that have been discussed but those haven't really been clearly spelled out so the discussion has been kind of out of focus. This seems like an overly detailed list, but the discussion has touched on pretty much all of these things. I think the goal should be to find the most cohesive combination for Python 4 and/or just go with B alone.

A. Do nothing.

B. Remove implicit concatenation.

(We could stop here, anything after this can be done later.)

We can't just "remove implicit concatenation", because that will break code which is currently working perfectly. And probably it will break more working code than it will fix unnoticed broken code.

So removal requires a deprecation schedule: deprecate for at least one release. The conservative approach is:

* mark as deprecated in the docs in 3.4;

* raise a deprecated warning in 3.5;

* remove in 3.6.

Correct, and is why I believe we should start the process... with the intention of doing it in python 4 or possibly earlier if there is support for doing it sooner. (I had put that in, but it got edited out.) Any way, this is my vote.

...

or even later. Any removal of functionality leads to code churn: people will be forced to change code that works now because it will stop working in the future. That's a serious cost even when there are clear and obvious benefits to the removal.

I agree with this also. I think starting the process now and depreciating it sooner rather than later would help reduce the code churn down the road. It will also help focus any future discussions of the additional features in the light of implicit concatenation being removed. Cheers, Ron

Mark Janssen

19 May 19 May

1:58 p.m.

New subject: Implicit string literal concatenation considered harmful (options)

...

We can't just "remove implicit concatenation", because that will break code which is currently working perfectly. And probably it will break more working code than it will fix unnoticed broken code.

Really? Isn't the number of programs breaking roughly equal to 2, perhaps less? MarkJ Tacoma, Washington

Ned Batchelder

2:23 p.m.

New subject: Implicit string literal concatenation considered harmful (options)

On 5/19/2013 2:58 PM, Mark Janssen wrote:

...

...
We can't just "remove implicit concatenation", because that will break code which is currently working perfectly. And probably it will break more working code than it will fix unnoticed broken code. Really? Isn't the number of programs breaking roughly equal to 2, perhaps less?

Interesting, how did you get that number? --Ned.

...

MarkJ Tacoma, Washington _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Nick Coghlan

5:33 p.m.

New subject: Implicit string literal concatenation considered harmful (options)

On 20 May 2013 05:24, "Ned Batchelder" <ned@nedbatchelder.com> wrote:

...

On 5/19/2013 2:58 PM, Mark Janssen wrote:

...
...
We can't just "remove implicit concatenation", because that will break

...

...
...
which is currently working perfectly. And probably it will break more working code than it will fix unnoticed broken code.

Really? Isn't the number of programs breaking roughly equal to 2,

code perhaps less?

...

Interesting, how did you get that number?

If it's based on the contents of these threads, be aware that at least one core developer (me) and probably more have already mostly tuned out on the grounds that the feature is obviously in wide enough use that changing it will break the world without adequate gain. We don't even have to speculate on what others might be doing, we know it would break *our* code. For example, porting Fedora to Python 3 is already going to be a pain. Breaking implicit string concatenation would be yet another road block making that transition more difficult. Cheers, Nick.

...

--Ned.

...
MarkJ Tacoma, Washington _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Serhiy Storchaka

20 May 20 May

7:40 a.m.

New subject: Implicit string literal concatenation considered harmful (options)

20.05.13 01:33, Nick Coghlan написав(ла):

...

For example, porting Fedora to Python 3 is already going to be a pain. Breaking implicit string concatenation would be yet another road block making that transition more difficult.

It will be a good cause for people to use Python 3 (but not Python 4).

Ron Adam

11:46 a.m.

New subject: Implicit string literal concatenation considered harmful (options)

On 05/19/2013 05:33 PM, Nick Coghlan wrote:

...

If it's based on the contents of these threads, be aware that at least one core developer (me) and probably more have already mostly tuned out on the grounds that the feature is obviously in wide enough use that changing it will break the world without adequate gain. We don't even have to speculate on what others might be doing, we know it would break *our* code.

Ok, so is it your opinion, that in order to remove implicit string joining, that an explicit replacement must be put in at the same time?

...

For example, porting Fedora to Python 3 is already going to be a pain. Breaking implicit string concatenation would be yet another road block making that transition more difficult.

This sounds more like a general request to not make any changes, rather than something about the specific item it self. To be clear, this is going to need a long removal schedule. Nothing will probably be actually be removed before 3.7 or later. Maybe two years from now? How about this: First, lets please differentiate string continuation from string concatenation. A string continuation to be a pre-run-time alteration. A string concatenation to be a run time operation. By documenting them that way, it will help make them easier to discuss and teach to new users. Redefine a line continuation character to be strictly a \+\n sequence. That removes the "character after line continuation" errors because a '\' without a newline after it isn't technically a line continuation character. Then use the '\' except when it's at the end of a line to be the explicit string continuation character. This should be easy to do also. We could add this in sooner rather than later. I don't think it would be a difficult patch, and I also don't think it would break anything. Implicit string continuations could be depreciated at the same time with the recommendation to start using the more explicit variation. *But not remove implicit string continuations until Python 4.0.* String continuations are a similar concept to line continuations, so the reuse of '\' for it is an easy concept to learn and remember. It's also easy to explain. This does not change a '\' used inside a string. String escape codes have their own rules. Examples: foo('a' 'b'): # This won't cause an error until Python 4.0 x = 'foo\n' \ 'bar\n' \ 'baz\n' x = ( 'foo\n' # easy to see trailing commas here. \ 'bar\n' \ 'baz\n' ) x = 'foo\n' \ \ 'bar\n' \ \ 'baz\n' If we allow \+newline to work as both a string continuation and line continuation, this could be... x = 'foo\n' \ 'bar\n' \ 'baz\n' This is probably the least disruptive way to do this, and the '\' as a string continuation, is consistent with the \+\n as a line continuation. A final note ... I think we can easily allow comments after line continuations if there is no space between the '\' and the '#'. x = 'foo\n' \# This comment is removed. 'bar\n' \# The new-line at the end is not removed. 'baz\n' If when the tokenizer finds a '\' followed by a '#', then it could remove the comment, backup one, and continue. What would happen is the \+comment+\n would be converted to \+\n. No space can be between the '\' and '#' for this to work. Seems like this should already work, but the current check for an invalid character after a line continuation raises an error before this can happen. Cheers, Ron

Georg Brandl

12:43 p.m.

New subject: Implicit string literal concatenation considered harmful (options)

Am 20.05.2013 00:33, schrieb Nick Coghlan:

...

On 20 May 2013 05:24, "Ned Batchelder" <ned@nedbatchelder.com <mailto:ned@nedbatchelder.com>> wrote:

...
On 5/19/2013 2:58 PM, Mark Janssen wrote:

...
...
We can't just "remove implicit concatenation", because that will break code which is currently working perfectly. And probably it will break more working code than it will fix unnoticed broken code.

Really? Isn't the number of programs breaking roughly equal to 2, perhaps less?

Interesting, how did you get that number?

If it's based on the contents of these threads, be aware that at least one core developer (me) and probably more have already mostly tuned out on the grounds that the feature is obviously in wide enough use that changing it will break the world without adequate gain. We don't even have to speculate on what others might be doing, we know it would break *our* code.

Yep. I just look at this thread every now and then to marvel at the absurdly complicated ideas people come up with to replace something straightforward :) Georg

Mark Janssen

12:12 p.m.

New subject: Implicit string literal concatenation considered harmful (options)

...

...
Really? Isn't the number of programs breaking roughly equal to 2, perhaps less?

Interesting, how did you get that number?

I was making a joke using "unreasonable precision", but I would like to actually see more than that (meaning: I don't think there is) in the standard library. There just isn't much, if at all, of a programmatic reason to use such a construct. It's 1) more typing, 2) a highly improbably sequence that accidently worked by the programmer, 3) it doesn't really satisfy any conceptual separation that I can envision (putting two string literals on the same line? what possible purpose?) And this is the point -- it's more likely a programmer error. Really, I have a hard time believing that the number of programs that would break being larger than a handful. And to fix it is a no-brainer. Mark

Ethan Furman

21 May 21 May

5:39 p.m.

New subject: Implicit string literal concatenation considered harmful (options)

On 05/20/2013 10:12 AM, Mark Janssen wrote:

...

...
...
Really? Isn't the number of programs breaking roughly equal to 2, perhaps less?

Interesting, how did you get that number?

I was making a joke using "unreasonable precision", but I would like to actually see more than that (meaning: I don't think there is) in the standard library. There just isn't much, if at all, of a programmatic reason to use such a construct. It's 1) more typing, 2) a highly improbably sequence that accidently worked by the programmer, 3) it doesn't really satisfy any conceptual separation that I can envision (putting two string literals on the same line? what possible purpose?)

And this is the point -- it's more likely a programmer error. Really, I have a hard time believing that the number of programs that would break being larger than a handful. And to fix it is a no-brainer.

On the same line is probably rare, I agree. On different lines it is very common. Much more common than the number of errors generated by the forgotten comma. -- ~Ethan~

Serhiy Storchaka

20 May 20 May

7:39 a.m.

New subject: Implicit string literal concatenation considered harmful (options)

19.05.13 21:58, Mark Janssen написав(ла):

...

...
We can't just "remove implicit concatenation", because that will break code which is currently working perfectly. And probably it will break more working code than it will fix unnoticed broken code.

Really? Isn't the number of programs breaking roughly equal to 2, perhaps less?

One is Python interpreter itself. What is other one?

Chris Angelico

11:21 a.m.

New subject: Implicit string literal concatenation considered harmful (options)

On Mon, May 20, 2013 at 10:39 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:

...

19.05.13 21:58, Mark Janssen написав(ла):

...
...
We can't just "remove implicit concatenation", because that will break code which is currently working perfectly. And probably it will break more working code than it will fix unnoticed broken code.

Really? Isn't the number of programs breaking roughly equal to 2, perhaps less?

One is Python interpreter itself. What is other one?

And the other, with apologies to WS Gilbert, isn't. But it really doesn't matter. As long as that number is greater than zero, changing this will be a problem. I've not seen a single suggestion that doesn't have downsides as annoying as implicit concat's. In the absence of a *strong* alternative, I would be against any sort of change; why break code if the replacement is hardly better than the current? ChrisA

Joao S. O. Bueno

8:20 a.m.

New subject: Implicit string literal concatenation considered harmful (options)

On 19 May 2013 15:58, Mark Janssen <dreamingforward@gmail.com> wrote:

...

...
We can't just "remove implicit concatenation", because that will break code which is currently working perfectly. And probably it will break more working code than it will fix unnoticed broken code.

Really? Isn't the number of programs breaking roughly equal to 2, perhaps less?

Actually, I find this wording somewhat offensive. I have to make use of this feature to code-in long log strings quite often,: as in human readable long strings that can't have an arbitrary amount of whitespace inside (not the case for embedded SQL/HTML snippets), and yet have to be indented along with the code. That is why my only other e-mail on this thread is about adding some syntax for auto- dedenting multiline strings. Don take me wrong, I dislike auto-concatenation just as the next guy - typing a new set of \" \" on each line sometimes makes me wonder if I shoul stop and code a plug-in for that on my editor - but currently it is the only way of making "pretty enterprise code" with long strings - but for even more verbose calls to "dedent" or explicit concatenation ? (which would not save typing the \" \" as well, just would add even more typing) But if you have an ok way of adding a long human-readable string into code with less typing and correct indentation, with the existing syntax, I'd like to know how do you do it. That would be better than saying "only 2 programs use this". js -><-

...

MarkJ Tacoma, Washington _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Steven D'Aprano

18 May 18 May

3:35 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 18/05/13 00:14, Ron Adam wrote:

...

On 05/17/2013 06:41 AM, Steven D'Aprano wrote:

...
On 17/05/13 19:32, Christian Tismer wrote:

[...]

...

Guido's example was a situation where a comma was left out and two strings were joined inside a list without an error message.

Actually, no, his error was inside a function call, and he was getting a TypeError of one too few arguments: [quote] I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b'). [end quote]

...

If you accidentally put a comma in a multi line expression inside parentheses, it becomes a tuple without an error message.

...
...
...
('abc' ... 'def', ... 'ghi') ('abcdef', 'ghi')

I think that in a realistic example, this sort of error is less likely than it might appear from such a trivial example. Normally you don't just create a string and do nothing with it. Here's an example from my own code: standardMsg = ( "actual and expected sequences differ in length; expected %d" " items but found %d." % (len(expected), len(actual)) ) msg = self._formatMessage(msg, standardMsg) If I were to accidentally insert an unwanted comma in the middle of the concatenation, I would find out immediately. Python performs very little compile-time checking for you, and that's a virtue. The cost of this is that if you type something you didn't want, Python will do it for you regardless, and you won't find out until you try to use it. The solution is that when typing up repetitive code, you have to be a little more vigilant in Python than you would need to be in some other languages, because Python won't protect you from certain types of typo: list_of_floats = [1.2345, 2.3456, 3,4567, 4.5678] Python will not warn you that you have two ints where you expected one float. I've made this mistake, and then spent inordinate amounts of time not noticing the comma, but I still don't have much sympathy with the view that it is the responsibility of the language to protect me from this sort of typo. [...]

...

Guido's suggestion is just to live with using a '+'. His point was that any extra overhead wouldn't be that harmful as literal string concatenations tend to be in initiation parts of programs.

I think I have found the fatal problem with that suggestion: it rules out using concatenation in docstrings at all. py> def test(): ... """Doc strings """ + "must be literals." ... py> test.__doc__ is None True The equivalent with implicit concatenation works as expected. [...]

...

...
...
The reason why Python has grown the recommendation to use parentheses comes more from the absence of a good alternative.

Maybe so, but now that we have multi-line expressions inside brackets, the need for an alternative is much reduced.

If you use braces, you get a one item list as the result.

Parentheses are used to change an expressions order of evaluation as well.

Just for the record, I am from Australia. Like in the UK, when we talk about "brackets", we mean *any* type of bracket, whether round, square or curly. Or as Americans may say, parentheses, brackets, braces. So when I say that we have multi-line expressions inside brackets, I'm referring to the fact that all three of ( [ and { act as explicit line continuations up to their matching closing bracket. -- Steven

Jim Jewett

20 May 20 May

11:07 p.m.

On Sat, May 18, 2013 at 4:35 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

... Normally you don't just create a string and do nothing with it.

I do. Or, rather, I assign it to a temp name, and then use that temp name in the next line -- temp variables seems less ugly than line continuations.

...

I think I have found the fatal problem with that suggestion: it rules out using concatenation in docstrings at all.

I think a better solution would be to loosen the requirements for docstrings. Reasonably harmless proposals include (1) Always use the first expression, if it isn't a statement. (2) str( < the above > ) (3) Special treatment for __doc__, such as __doc__ = ... -jJ

Jim Jewett

17 May 17 May

1:23 p.m.

On Fri, May 17, 2013 at 7:41 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

They clearly should be in different threads. Line continuation is orthogonal to string continuation. You can have string concatenation on a single line:

In theory. In practice, the times when I'm having trouble fitting something onto a single line *and* cannot find a good place to break it (using parens), the problem almost always involves a string. And the number of times I needed to concatenate two strings on the same line (but wasn't willing to use a +) has been ... only when when a seemingly arbitrary syntax restriction requires a literal string -- basically, when writing a docstring.

...

On 17/05/13 19:32, Christian Tismer wrote:

...

...
- continuation of a string on the next line will later enforce the backslash.

...

I don't understand what this sentence means.

Today, (if you're not writing a docstring) you can write "abcd" "efgh" and it magically turns into "abcdefgh". He proposes that -- eventually -- you would have to write "abcd" \ "efgh" so that the \ would be an explicit indicator that you were continuing the line, and hadn't just forgotten a comma.

...

-1 since there are uses for concatenating strings on a single line.

I understand "create a string demonstrating all the quoting conventions". I don't understand why an explicit + is so bad in that case. Nor do I understand what would be so horrible about breaking the physical line there. So the only use I know about is docstrings. And maybe that should be fixed there, instead. -jJ

Ron Adam

16 May 16 May

1:51 p.m.

On 05/16/2013 12:38 PM, MRAB wrote:

...

...
I like the \ idea because it's clearly syntax and not an operator, but the fact that it doesn't work with comments is annoying since one reason to break a string is to insert comments. I don't like that spaces after the \ are not allowed because trailing spaces are invisible to me but not to the parser. So what if the rule for trailing \ was changed to:

The \ continuation character may be followed by white space and a comment. If a comment is present, there must be at least one whitespace character between the \ and the comment.

Why do you say """there must be at least one whitespace character between the \ and the comment"""?

I'd like comments after a line continuation also. There is an issue with it in strings. The tokenizer uses the '\'+'\n' as a line continuation, rather than a single '\'. By doing that, it can handle line continuations on any line exactly the same.

...

...
...
"This is a backslash \, and this\ ... line is continued also." 'This is a backslash \\, and this line is continued also.'

The \ is also used as a string escape sequence character also. Outside of strings the '\' anywhere except at the end of a line is an error. So we can do that without any issues with previous code. But we need to not change it's behaviour between quotes. Cheers Ron

Christian Tismer

1:07 p.m.

New subject: Implicit string literal concatenation considered harmful?

Hey Bruce! On 16.05.13 19:26, Bruce Leban wrote:

...

On Thu, May 16, 2013 at 9:57 AM, Christian Tismer <tismer@stackless.com <mailto:tismer@stackless.com>> wrote:

The "&" is not a valid operator on strings and looks pretty much like gluing parts together. It is better than the "\" that just escapes the newline and cannot take comments.

I don't like something that is a standard operator becoming special syntax. While it's true that string & string is not valid, it's not the case that string & ... is not valid. I dislike dot for the same reason. It's confusing that these would do different things:

'abc' & 'def' ('abc') & 'def'

I like the \ idea because it's clearly syntax and not an operator, but the fact that it doesn't work with comments is annoying since one reason to break a string is to insert comments. I don't like that spaces after the \ are not allowed because trailing spaces are invisible to me but not to the parser. So what if the rule for trailing \ was changed to:

The \ continuation character may be followed by white space and a comment. If a comment is present, there must be at least one whitespace character between the \ and the comment.

That is:

x = [ # THIS WOULD BE ALLOWED 'abc' \ 'def' \ # not the python keyword 'ghi' ]

x = [ # THIS WOULD BE AN ERROR 'abc' \ 'def' # a comment but no continuation \ 'ghi' ]

One thing I like about using \ is that it already works (aside from my proposed comment change). So anyone wanting to write forward/backward-compatible code can just add the \s now. If you want to start enforcing the restriction, just use from __future__ import explicit_string_continuation.

Right, that's a good one! Although I hate the backslash from bad experience with windows. But actually the most reason that I always hated to use "\" for continuation lines is its strict behavior that does not allow any white space after it. Hey, it would be great if that proposal makes it ! cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Jim Jewett

4:44 p.m.

On Thu, May 16, 2013 at 1:26 PM, Bruce Leban <bruce@leapyear.org> wrote:

...

... So what if the rule for trailing \ was changed to:

The \ continuation character may be followed by white space and a comment. If a comment is present, there must be at least one whitespace character between the \ and the comment.

YES!!! Even ignoring string concatenation, this would be a huge win. Limiting implicit string concatenation to "same logical line" or even "adjacent physical lines joined by a line-continuation '\'-character" *might* be even better. -jJ

Dave Peticolas

4:55 p.m.

2013/5/16 Jim Jewett <jimjjewett@gmail.com>

...

On Thu, May 16, 2013 at 1:26 PM, Bruce Leban <bruce@leapyear.org> wrote:

...
... So what if the rule for trailing \ was changed to:

The \ continuation character may be followed by white space and a comment. If a comment is present, there must be at least one whitespace character between the \ and the comment.

YES!!! Even ignoring string concatenation, this would be a huge win.

Limiting implicit string concatenation to "same logical line" or even "adjacent physical lines joined by a line-continuation '\'-character" *might* be even better.

I think the latter would almost make it explicit string concatenation, no? That sounds like one of the cleanest solutions so far. -- --Dave Peticolas

Serhiy Storchaka

17 May 17 May

3:03 a.m.

16.05.13 20:26, Bruce Leban написав(ла):

...

I like the \ idea because it's clearly syntax and not an operator, but the fact that it doesn't work with comments is annoying since one reason to break a string is to insert comments. I don't like that spaces after the \ are not allowed because trailing spaces are invisible to me but not to the parser. So what if the rule for trailing \ was changed to:

The \ continuation character may be followed by white space and a comment. If a comment is present, there must be at least one whitespace character between the \ and the comment.

It's not needed. You could just use the "+" operator if you want to insert comments. Or verbose mode of regexpes. And it works right now.

Stefan Drees

16 May 16 May

12:27 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 16.05.13 18:57, Christian Tismer wrote:

...

On 16.05.13 18:23, MRAB wrote:

...
On 16/05/2013 16:57, Andrew Barnert wrote:

...
On May 16, 2013, at 8:00, MRAB <python@mrabarnett.plus.com> wrote:

...
On 16/05/2013 15:44, Chris Angelico wrote:

...
On Fri, May 17, 2013 at 12:40 AM, MRAB <python@mrabarnett.plus.com> wrote:

...
On 16/05/2013 08:08, Serhiy Storchaka wrote: > As was said before the '+' operator has less priority than > the '%' operator and an attribute access, i.e. it requires > parenthesis in some cases. However parenthesis introduce a > noise and can cause other types of errors. I wonder whether we could use ".". Or would that be too confusing?

...
...
And I apologized for borrowing an idea from bash. Taking an idea from PHP?!? It has high precendence as far as the parser is concerned.

I know that Perl uses it. I haven't looked at PHP (I hear bad things about it! :-)).

...
Seriously, I don't think another operator is needed. If it's not going to be the implicit concatenation by abuttal, + or \ will carry the matter. But I share the opinion of several here: implicit concatenation is not as bad as the alternatives. It wouldn't be an operator as such

Of course in php, perl, and every other language that uses dot for string concatenation, it _is_ an operator, so this will end up confusing the very people who initially find it comforting.

And this means the parser has to figure out whether you mean dot for attribute access or dot for concatenation. That's not exactly a _hard_ problem, but it's not _trivial_.

And then there's the fact that the "precedence" is different depending on which meaning the dot gets. Remember that what you're trying to solve is the problem that member-dot and % both have higher precedence than +.

I thought the problem we were trying to solve was that "+" has a lower precedence than "%" and attribute/method access, so implicit concatenation that's followed by "%" or ".format" can't be replaced by "+" without adding extra parentheses.

I think the "." is a nice idea at first sight, but might become confusing in the end because what we actually need is a simple to use notation for the scanner/parser that denotes a continuation line, and _not_ an operator.

Now, what about this?

long_line = "the beginning and the"& # comments are ok " continuation of a string"

The "&" is not a valid operator on strings and looks pretty much like gluing parts together. It is better than the "\" that just escapes the newline and cannot take comments. I would even enforce that the ampersand be on the same line.

'a bitwise or :-?'& ' why not ...' in php the dot (.) is so abundantly used for staying within the line width limits, I often also insert it instead of a plus (+) when switching to python and the other way around. All the best, Stefan

Andrew Barnert

3:51 p.m.

From: MRAB <python@mrabarnett.plus.com> Sent: Thursday, May 16, 2013 9:23 AM

...

On 16/05/2013 16:57, Andrew Barnert wrote:

...

...
And then there's the fact that the "precedence" is different depending on which meaning the dot gets. Remember that what you're trying to solve is the problem that member-dot and % both have higher precedence than +.

I thought the problem we were trying to solve was that "+" has a lower precedence than "%" and attribute/method access, so implicit concatenation that's followed by "%" or ".format" can't be replaced by "+" without adding extra parentheses.

I was talking about the fact that Guido's 'Just use "+"' suggestion is insufficient, because it requires adding extra parentheses. Therefore, the problem we're trying to solve is 'member-dot and % both have higher precedence than +.' Your '"+" has a lower precedence than "%" and attribute/method access' means the exact same thing, just stated in the opposite order. So… I think I'm missing your point.

MRAB

5:19 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 16/05/2013 21:51, Andrew Barnert wrote:

...

From: MRAB <python@mrabarnett.plus.com>

Sent: Thursday, May 16, 2013 9:23 AM

...
On 16/05/2013 16:57, Andrew Barnert wrote:

...
...
And then there's the fact that the "precedence" is different depending on which meaning the dot gets. Remember that what you're trying to solve is the problem that member-dot and % both have higher precedence than +.

I thought the problem we were trying to solve was that "+" has a lower precedence than "%" and attribute/method access, so implicit concatenation that's followed by "%" or ".format" can't be replaced by "+" without adding extra parentheses.

I was talking about the fact that Guido's 'Just use "+"' suggestion is insufficient, because it requires adding extra parentheses. Therefore, the problem we're trying to solve is 'member-dot and % both have higher precedence than +.' Your '"+" has a lower precedence than "%" and attribute/method access' means the exact same thing, just stated in the opposite order.

So… I think I'm missing your point.

You said """there's the fact that the "precedence" is different depending on which meaning the dot gets""". My point was that "." between string literals (which is currently a syntax error) would indicate concatenation of those literals, but there would be no change in precedence; it wouldn't replace "+".

Joao S. O. Bueno

2:03 p.m.

On 16 May 2013 12:57, Andrew Barnert <abarnert@yahoo.com> wrote:

...

And this means the parser has to figure out whether you mean dot for attribute access or dot for concatenation. That's not exactly a _hard_ problem, but it's not _trivial_.

If you say it mis not hard for the parser, ok - but it seems impossible for humans: upper = " World" print ("Hello". upper) -

MRAB

2:29 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 16/05/2013 20:03, Joao S. O. Bueno wrote:

...

On 16 May 2013 12:57, Andrew Barnert <abarnert@yahoo.com> wrote:

...
And this means the parser has to figure out whether you mean dot for attribute access or dot for concatenation. That's not exactly a _hard_ problem, but it's not _trivial_.

If you say it mis not hard for the parser, ok - but it seems impossible for humans:

upper = " World" print ("Hello". upper)

That's attribute access. The suggestion was to use it in place of implicit string concatenation, which occurs only between string _literals_: print ("Hello" . " World") and is currently illegal ("SyntaxError: invalid syntax").

Joao S. O. Bueno

17 May 17 May

4:55 a.m.

On 16 May 2013 16:29, MRAB <python@mrabarnett.plus.com> wrote:

...

On 16/05/2013 20:03, Joao S. O. Bueno wrote:

...
On 16 May 2013 12:57, Andrew Barnert <abarnert@yahoo.com> wrote:

...
And this means the parser has to figure out whether you mean dot for attribute access or dot for concatenation. That's not exactly a _hard_ problem, but it's not _trivial_.

If you say it mis not hard for the parser, ok - but it seems impossible for humans:

upper = " World" print ("Hello". upper)

That's attribute access.

But you are suggesting it should be string concatenation. It is already in use for attribute access, as you can see - and one writting a program, or reading one should not have to be thinking """ah - but here I can't use the "." because I am concatenating a string in a variable, not a literal string""""

...

The suggestion was to use it in place of implicit string concatenation, which occurs only between string _literals_:

print ("Hello" . " World")

and is currently illegal ("SyntaxError: invalid syntax").

What is that? One thing that works in a way for literals and in another way for expressions? Sorry, but there is onlye one word for this: Insanity!

Chris Angelico

6:09 a.m.

On Fri, May 17, 2013 at 7:55 PM, Joao S. O. Bueno <jsbueno@python.org.br> wrote:

...

On 16 May 2013 16:29, MRAB <python@mrabarnett.plus.com> wrote:

...
The suggestion was to use it in place of implicit string concatenation, which occurs only between string _literals_:

print ("Hello" . " World")

and is currently illegal ("SyntaxError: invalid syntax").

What is that? One thing that works in a way for literals and in another way for expressions? Sorry, but there is onlye one word for this: Insanity!

One of the things I love about Python is that a "thing" can be used in the same ways whether it's from a literal, a variable/name lookup, a function return value, a class member, an instance member, etc, etc, etc. (Sometimes this requires strange magic, like member function calling, but you still have the principle that "a=foo.bar(quux)" and "_=foo.bar; a=_(quux)" do the same thing.) So anything that makes str.str mean something weird gets a -1 from me. The proposals involving ellipsis have at least the virtue that it's clearly a syntactic element and not an operator, but I suspect the syntax will be more problematic than useful. If it looks like an operator, it should BE an operator. ChrisA

Chris Kaynor

12:07 p.m.

On Fri, May 17, 2013 at 4:09 AM, Chris Angelico <rosuav@gmail.com> wrote:

...

On Fri, May 17, 2013 at 7:55 PM, Joao S. O. Bueno <jsbueno@python.org.br> wrote:

...
On 16 May 2013 16:29, MRAB <python@mrabarnett.plus.com> wrote:

...
The suggestion was to use it in place of implicit string concatenation, which occurs only between string _literals_:

print ("Hello" . " World")

and is currently illegal ("SyntaxError: invalid syntax").

What is that? One thing that works in a way for literals and in another way for expressions? Sorry, but there is onlye one word for this: Insanity!

One of the things I love about Python is that a "thing" can be used in the same ways whether it's from a literal, a variable/name lookup, a function return value, a class member, an instance member, etc, etc, etc. (Sometimes this requires strange magic, like member function calling, but you still have the principle that "a=foo.bar(quux)" and "_=foo.bar; a=_(quux)" do the same thing.) So anything that makes str.str mean something weird gets a -1 from me. The proposals involving ellipsis have at least the virtue that it's clearly a syntactic element and not an operator, but I suspect the syntax will be more problematic than useful.

If it looks like an operator, it should BE an operator.

Just to point out that the "." is already overloaded in some cases in Python. Take a look at this literal: 1.2 Surely, that should mean the 2 attribute of the integer 1, correct?

Chris Angelico

12:12 p.m.

On Sat, May 18, 2013 at 3:07 AM, Chris Kaynor <ckaynor@zindagigames.com> wrote:

...

On Fri, May 17, 2013 at 4:09 AM, Chris Angelico <rosuav@gmail.com> wrote:

...
On Fri, May 17, 2013 at 7:55 PM, Joao S. O. Bueno <jsbueno@python.org.br> wrote:

...
On 16 May 2013 16:29, MRAB <python@mrabarnett.plus.com> wrote:

...
The suggestion was to use it in place of implicit string concatenation, which occurs only between string _literals_:

print ("Hello" . " World")

and is currently illegal ("SyntaxError: invalid syntax").

What is that? One thing that works in a way for literals and in another way for expressions? Sorry, but there is onlye one word for this: Insanity!

One of the things I love about Python is that a "thing" can be used in the same ways whether it's from a literal, a variable/name lookup, a function return value, a class member, an instance member, etc, etc, etc. (Sometimes this requires strange magic, like member function calling, but you still have the principle that "a=foo.bar(quux)" and "_=foo.bar; a=_(quux)" do the same thing.) So anything that makes str.str mean something weird gets a -1 from me. The proposals involving ellipsis have at least the virtue that it's clearly a syntactic element and not an operator, but I suspect the syntax will be more problematic than useful.

If it looks like an operator, it should BE an operator.

Just to point out that the "." is already overloaded in some cases in Python. Take a look at this literal: 1.2 Surely, that should mean the 2 attribute of the integer 1, correct?

Ahh, true. Good point. I guess literals follow slightly different rules. Still, I don't like the idea of: "hello" . "world" not being an operator. Removing all the whitespace doesn't help, since this notation is specifically about line continuation. ChrisA

Andrew Barnert

16 May 16 May

3:55 p.m.

From: Joao S. O. Bueno <jsbueno@python.org.br> Sent: Thursday, May 16, 2013 12:03 PM

...

...
And this means the parser has to figure out whether you mean dot for attribute access or dot for concatenation. That's not exactly a _hard_

On 16 May 2013 12:57, Andrew Barnert <abarnert@yahoo.com> wrote: problem, but it's not _trivial_.

If you say it mis not hard for the parser, ok - but it seems impossible for humans:

upper = " World" print ("Hello". upper)

Given a rule like "it's only concatenation if both arguments are string literals", a sufficiently complex parser, or a sufficiently knowledgeable human, can figure out that this is attribute access. So it's clearly not impossible. But it's also not trivial. And that's my point. It makes the code harder to read for both parsers and humans, which is a significant tradeoff. If the benefit is high enough, it might be worth it anyway, but I don't know that it is.

INADA Naoki

9:50 a.m.

I have some php experiment. Using dot for string concatenation cause readability hazard. func("foo", "bar". "bazz", "spam". "egg") On Thu, May 16, 2013 at 10:40 PM, MRAB <python@mrabarnett.plus.com> wrote:

...

On 16/05/2013 08:08, Serhiy Storchaka wrote:

...
10.05.13 21:48, Guido van Rossum написав(ла):

...
I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

As was said before the '+' operator has less priority than the '%' operator and an attribute access, i.e. it requires parenthesis in some cases. However parenthesis introduce a noise and can cause other types of errors.

[snip] I wonder whether we could use ".". Or would that be too confusing?

In all cases only multiline implicit string literal concatenation cause

...
problem. What if forbid implicit string literal concatenation only between string literals on different physical lines? A deliberate string literal concatenation can be made with explicit line joining.

raise ValueError('Type names and field names must be valid '\ 'identifiers: %r' % name)

raise ValueError('Type names and field names must be valid ' .

'identifiers: %r' % name)

...
raise ValueError('{} not bottom-level directory in '\ '{!r}'.format(_PYCACHE, path))

raise ValueError('{} not bottom-level directory in ' .

'{!r}'.format(_PYCACHE, path))

______________________________**_________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>

-- INADA Naoki <songofacandy@gmail.com>

Ron Adam

1:28 p.m.

On 05/16/2013 02:08 AM, Serhiy Storchaka wrote:

...

In all cases only multiline implicit string literal concatenation cause problem. What if forbid implicit string literal concatenation only between string literals on different physical lines? A deliberate string literal concatenation can be made with explicit line joining.

And it already works. It might be a good PEP8 recommendation.

...

ignore_patterns = ( 'Function "%s" not defined.' % breakpoint, "warning: no loadable sections found in added symbol-file"\ " system-supplied DSO", "warning: Unable to find libthread_db matching"\ " inferior's thread library, thread debugging will"\ " not be available.", "warning: Cannot initialize thread debugging"\ " library: Debugger service failed", 'warning: Could not load shared library symbols for '\ 'linux-vdso.so', 'warning: Could not load shared library symbols for '\ 'linux-gate.so', 'Do you need "set solib-search-path" or '\ '"set sysroot"?', )

In this example, the lines tend to run together visually, and the '\' competes with the comma. But these have more to do with style than syntax and can be improved by indenting the continued lines. I think the line continuation '\' character would also make a good explicit string literal concatenation character. It's already limited to only work across sequential lines as well. Cheers, Ron

Steven D'Aprano

17 May 17 May

3:49 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 17/05/13 04:28, Ron Adam wrote:

...

I think the line continuation '\' character would also make a good explicit string literal concatenation character. It's already limited to only work across sequential lines as well.

Concatenating strings on the same line is a legitimate thing to do, because you can mix different quoting types. In my opinion, there's no really clean way to build string literals containing multiple types of quotation marks, but implicit concatenation works and is useful. Here's a contrived example: s = "'Aren't you supposed to be " '"working"?' "', he asked with a wink." Arguments about whether that is uglier than using backslashes to /dev/null please :-) -- Steven

rurpy＠yahoo.com

12:14 p.m.

New subject: Implicit string literal concatenation considered harmful?

On 05/17/2013 02:49 AM, Steven D'Aprano wrote:

...

On 17/05/13 04:28, Ron Adam wrote:

...
I think the line continuation '\' character would also make a good explicit string literal concatenation character. It's already limited to only work across sequential lines as well.

Concatenating strings on the same line is a legitimate thing to do, because you can mix different quoting types. In my opinion, there's no really clean way to build string literals containing multiple types of quotation marks, but implicit concatenation works and is useful. Here's a contrived example:

s = "'Aren't you supposed to be " '"working"?' "', he asked with a wink."

And here's a non-contrived one (almost) verbatim from working code: pattern = '[^\uFF1B\u30FB\u3001' r'+:=.,\/\[\]\t\r\n]+' '[\#\uFF03]+' In Python 2 this had been: pattern = ur'[^\uFF1B\u30FB\u3001+:=.,\/\[\]\t\r\n]+[\#\uFF03]+' but was changed to first form above due to Python 3's removal of lexical evaluation of \u literals in raw strings (see http://bugs.python.org/issue14973). Obviously the concatenation could have been done with the + operator but I felt the given form was clearer than trying to visually get whether any particular "+" was inside or outside of a string. There are other more complex regex with more +'s and my preference is to adopt a particular form I can use for most/all such rather than to tweak forms based on a particular string's content. I am assuming this discussion is regarding a possible Python 4 feature -- adjacent string literal concatenation has been documented behavior of Python going back to at least version 1.4.0 (the earliest doc available on python.org): "2.4.1.1 String literal concatenation Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation..." I also have been using adjacent string literal concatenation in the "help" parameters of argparse calls as standard practice for many years, albeit on separate lines.

Serhiy Storchaka

16 May 16 May

2:20 a.m.

10.05.13 21:48, Guido van Rossum написав(ла):

...

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Could your please run this lint rules against Python sources? I found at least one bug in Tools/scripts/abitype.py: typeslots = [ 'tp_name', 'tp_basicsize', ... 'tp_subclasses', 'tp_weaklist', 'tp_del' 'tp_version_tag' ] http://bugs.python.org/issue17993

Serhiy Storchaka

14 Mar 14 Mar

10:34 a.m.

New subject: Implicit string literal concatenation considered harmful?

10.05.13 21:48, Guido van Rossum пише:

...

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

This already was discussed 5 years ago. See the topic "Implicit string literal concatenation considered harmful?" started by GvR. https://mail.python.org/pipermail/python-ideas/2013-May/020527.html First that reviving this discussion please take a look at arguments made at former discussion, and make sure that your arguments are new.

Paul Moore

11:20 a.m.

New subject: Implicit string literal concatenation considered harmful?

On 14 March 2018 at 15:34, Serhiy Storchaka <storchaka@gmail.com> wrote:

...

10.05.13 21:48, Guido van Rossum пише:

...
I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b').

This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden).

Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.)

Would it be reasonable to start deprecating this and eventually remove it from the language?

This already was discussed 5 years ago. See the topic "Implicit string literal concatenation considered harmful?" started by GvR.

https://mail.python.org/pipermail/python-ideas/2013-May/020527.html

First that reviving this discussion please take a look at arguments made at former discussion, and make sure that your arguments are new.

To be fair, Guido's post was about removing implicit concatenation, and much of the subsequent thread was only somewhat related (proposals for alternative syntax). And the ultimate conclusion from Guido was that the breakage would be too high. None of which really relates to the proposal here, which is that we should "discourage" use without any language change. I'm just not clear how such discouragement would make any practical difference (I for one would likely just ignore it). Paul

2431

Age (days ago)

4200

Last active (days ago)

List overview

Download

165 comments

51 participants

participants (51)

alex23
Alexander Belopolsky
Andrew Barnert
Antoine Pitrou
Antonio Messina
Barry Warsaw
Ben Darnell
Bruce Leban
Cameron Simpson
Chris Angelico
Chris Kaynor
Christian Tismer
Dave Peticolas
Eli Bendersky
Ethan Furman
Ezio Melotti
Georg Brandl
Greg Ewing
Gregory P. Smith
Guido van Rossum
Haoyi Li
Ian Cordasco
INADA Naoki
Jan Kaliszewski
Jim Jewett
Joao S. O. Bueno
João Bernardo
Juancarlo Añez
Kabie
M.-A. Lemburg
Mark Dickinson
Mark Janssen
Mark Lawrence
Markus Unterwaditzer
Matt Chaput
Michael Foord
Michael Mitchell
MRAB
Ned Batchelder
Nick Coghlan
Paul Moore
Philip Jenvey
Random832
Raymond Hettinger
Ron Adam
rurpy＠yahoo.com
Serhiy Storchaka
Stefan Behnel
Stefan Drees
Stephen J. Turnbull
Steven D'Aprano