Mailman 3 Descouraging the implicit string concatenation - Python-ideas

Descouraging the implicit string concatenation

Facundo Batista

March 14, 2018

5:18 a.m.

Hello! What would you think about formally descouraging the following idiom? long_string = ( "some part of the string " "with more words, actually is the same " "string that the compiler puts together") We should write the following, instead: long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together") I know that "no change to Python itself" is needed, but having a formal discouragement of the idiom will help in avoiding people to fall in mistakes like: fruits = { "apple", "orange" "banana", "melon", } (and even making the static analysers, like pyflakes or pylint, to show that as a warning) Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time. Thanks!! -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista

Show replies by date

Søren Pilgård

March 2018

5:43 a.m.

On Wed, Mar 14, 2018 at 1:18 PM, Facundo Batista <facundobatista@gmail.com> wrote:

...

I agree that implicit concatenation is a bad feature. I have seen it cause multiple stupid errors while I have never actually used it for anything useful. I have also experienced beginners asking why you can do `x = "abc" "def"` but not `a = "abc"; b = "def"; x = a b` and then you have to either explain them the differences between strings and string literals or just tell them to always use `+`. I think it breaks multiple parts of the zen of python: Explicit is better than implicit. Special cases aren't special enough to break the rules. Errors should never pass silently. There should be one-- and preferably only one --obvious way to do it.

Steven D'Aprano

6:26 a.m.

On Wed, Mar 14, 2018 at 01:43:53PM +0100, Søren Pilgård wrote: [...]

...

If you tell them to "always use `+`", you are teaching them to be cargo-cult coders who write code they don't understand for reasons they don't know. Life is too short to learn *everything*, I know that, and I've certainly copied lots of code I don't understand (and hoped I'd never need to debug it!). If that makes me a cargo-cult coder too, so be it. But never over something as simple as the difference between names a and b, and string literals "abc", "def". -- Steve

Søren Pilgård

6:48 a.m.

On Wed, Mar 14, 2018 at 2:26 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Yes of course the skilled programmer needs to understand the difference. But I am talking about absolute beginners where python in many regards is an excelent first language. And in time they will probably get an understanding of why there is a difference, but when you still have trouble telling the difference between 5 and "5" then telling them the difference between strings and string litterals will only confuse them more. Trying to teach a person everything at once does not work well, limiting the curriculum and defering some parts for later study is hardly cargo-cult coding. At least not more than all the other things that just work the way they do because thats how it works - from the beginners point of view.

eryk sun

5:56 a.m.

On Wed, Mar 14, 2018 at 12:18 PM, Facundo Batista <facundobatista@gmail.com> wrote:

...

Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

The above statement is not true for versions prior to 3.7. Previously the addition of string literals was optimized by the peephole optimizer, with a limit of 20 characters. Do you mean to formally discourage implicit string-literal concatenation only for 3.7+?

Steven D'Aprano

6:15 a.m.

On Wed, Mar 14, 2018 at 09:18:30AM -0300, Facundo Batista wrote:

...

Hello!

What would you think about formally descouraging the following idiom?

long_string = ( "some part of the string " "with more words, actually is the same " "string that the compiler puts together")

I would hate that.

...

We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

Should we? I disagree. Of course you're welcome to specify that in your own style-guide for your own code, but I won't be following that recommendation.

...

I know that "no change to Python itself" is needed, but having a formal discouragement of the idiom will help in avoiding people to fall in mistakes like:

fruits = { "apple", "orange" "banana", "melon", }

People can make all sorts of mistakes through carlessness. I wrote {y, x*3} the other day instead of {y: x**3}. (That's *two* errors in one simple expression. I wish I could say it was a record for me.) Should we "discourage" exponentiation and dict displays and insist on writing dict((y, x*x*x)) to avoid the risk of errors? I don't think so. I think string concatenation falls into the same category. Sometimes even the most careful writer makes a mistake (let alone careless writers). That's life. Not every problem needs a technical "solution". Sometimes the right solution is to proof-read your code, or have code review by a fresh pair of eyes. And tests, of course. [...]

...

Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

That is an implementation feature, not a language requirement. Not all Python interpreters will do that, and they are free to put limits on how much they optimize. Here's Python 3.5: py> import dis py> dis.dis('s = "abcdefghijklmnopqrs" + "t"') 1 0 LOAD_CONST 3 ('abcdefghijklmnopqrst') 3 STORE_NAME 0 (s) 6 LOAD_CONST 2 (None) 9 RETURN_VALUE But now see this: py> dis.dis('s = "abcdefghijklmnopqrs" + "tu"') 1 0 LOAD_CONST 0 ('abcdefghijklmnopqrs') 3 LOAD_CONST 1 ('tu') 6 BINARY_ADD 7 STORE_NAME 0 (s) 10 LOAD_CONST 2 (None) 13 RETURN_VALUE And older versions of CPython didn't optimize this at all, and some day there could be a command-line switch or environment variable to turn these optimizations off. With string concatentation having potential O(N**2) performance, if you're writing explicit string concatenations with +, they could potentially be *very* expensive at runtime. -- Steve

Joao S. O. Bueno

6:32 a.m.

While I personally dislike implicit string concatenation, I've worked in projects that use it, and I, more often than I'd like, use them myself. The problem is that there is no "less typing" way of entering a correct multi-line string - due to multiline string literals preserving the indentation characters. It is a bit cumbersome to do something like from textwrap import dedent as D a = D("""\ My code-aligned, multiline text with no spaces to the left goes here """) And people then resort for implicit concatenation instead of that. I am sure this has been discussed before, and decided against, but I'd be all for taking the occasion to add a "D" prefix to multiline strings to apply de-denting to the literals. Regards, js -><- On 14 March 2018 at 10:15, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Wed, Mar 14, 2018 at 09:18:30AM -0300, Facundo Batista wrote:

...
Hello!

What would you think about formally descouraging the following idiom?

long_string = ( "some part of the string " "with more words, actually is the same " "string that the compiler puts together")

I would hate that.

...
We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

Should we? I disagree.

Of course you're welcome to specify that in your own style-guide for your own code, but I won't be following that recommendation.

...
I know that "no change to Python itself" is needed, but having a formal discouragement of the idiom will help in avoiding people to fall in mistakes like:

fruits = { "apple", "orange" "banana", "melon", }

People can make all sorts of mistakes through carlessness. I wrote

{y, x*3}

the other day instead of {y: x**3}. (That's *two* errors in one simple expression. I wish I could say it was a record for me.) Should we "discourage" exponentiation and dict displays and insist on writing dict((y, x*x*x)) to avoid the risk of errors? I don't think so.

I think string concatenation falls into the same category. Sometimes even the most careful writer makes a mistake (let alone careless writers). That's life. Not every problem needs a technical "solution". Sometimes the right solution is to proof-read your code, or have code review by a fresh pair of eyes.

And tests, of course.

[...]

...
Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

That is an implementation feature, not a language requirement. Not all Python interpreters will do that, and they are free to put limits on how much they optimize. Here's Python 3.5:

py> import dis py> dis.dis('s = "abcdefghijklmnopqrs" + "t"') 1 0 LOAD_CONST 3 ('abcdefghijklmnopqrst') 3 STORE_NAME 0 (s) 6 LOAD_CONST 2 (None) 9 RETURN_VALUE

But now see this:

py> dis.dis('s = "abcdefghijklmnopqrs" + "tu"') 1 0 LOAD_CONST 0 ('abcdefghijklmnopqrs') 3 LOAD_CONST 1 ('tu') 6 BINARY_ADD 7 STORE_NAME 0 (s) 10 LOAD_CONST 2 (None) 13 RETURN_VALUE

And older versions of CPython didn't optimize this at all, and some day there could be a command-line switch or environment variable to turn these optimizations off.

With string concatentation having potential O(N**2) performance, if you're writing explicit string concatenations with +, they could potentially be *very* expensive at runtime.

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Søren Pilgård

6:40 a.m.

On Wed, Mar 14, 2018 at 2:15 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Wed, Mar 14, 2018 at 09:18:30AM -0300, Facundo Batista wrote:

...
Hello!

What would you think about formally descouraging the following idiom?

long_string = ( "some part of the string " "with more words, actually is the same " "string that the compiler puts together")

I would hate that.

...
We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

Should we? I disagree.

Of course you're welcome to specify that in your own style-guide for your own code, but I won't be following that recommendation.

...
I know that "no change to Python itself" is needed, but having a formal discouragement of the idiom will help in avoiding people to fall in mistakes like:

fruits = { "apple", "orange" "banana", "melon", }

People can make all sorts of mistakes through carlessness. I wrote

{y, x*3}

the other day instead of {y: x**3}. (That's *two* errors in one simple expression. I wish I could say it was a record for me.) Should we "discourage" exponentiation and dict displays and insist on writing dict((y, x*x*x)) to avoid the risk of errors? I don't think so.

I think string concatenation falls into the same category. Sometimes even the most careful writer makes a mistake (let alone careless writers). That's life. Not every problem needs a technical "solution". Sometimes the right solution is to proof-read your code, or have code review by a fresh pair of eyes.

And tests, of course.

[...]

...
Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

That is an implementation feature, not a language requirement. Not all Python interpreters will do that, and they are free to put limits on how much they optimize. Here's Python 3.5:

py> import dis py> dis.dis('s = "abcdefghijklmnopqrs" + "t"') 1 0 LOAD_CONST 3 ('abcdefghijklmnopqrst') 3 STORE_NAME 0 (s) 6 LOAD_CONST 2 (None) 9 RETURN_VALUE

But now see this:

py> dis.dis('s = "abcdefghijklmnopqrs" + "tu"') 1 0 LOAD_CONST 0 ('abcdefghijklmnopqrs') 3 LOAD_CONST 1 ('tu') 6 BINARY_ADD 7 STORE_NAME 0 (s) 10 LOAD_CONST 2 (None) 13 RETURN_VALUE

And older versions of CPython didn't optimize this at all, and some day there could be a command-line switch or environment variable to turn these optimizations off.

With string concatentation having potential O(N**2) performance, if you're writing explicit string concatenations with +, they could potentially be *very* expensive at runtime.

Of course you can always make error, even in a single letter. But I think there is a big difference between mixing up +-/* and ** where the operator is in "focus" and the implicit concatenation where there is no operator. A common problem is that you have something like foo(["a", "b", "c" ]).bar() but then you remember that there also needs to be a "d" so you just add foo(["a", "b", "c" "d" ]).bar() Causing an error, not with the "d" expression you are working on but due to what you thought was the previous expression but python turns it into one. The , is seen as a delimiter by the programmer not as part of the operation (or the lack of the ,). We can't remove all potential pitfalls, but I do think there is value in evaluating whether something has bigger potential to cause harm than the benefits it brings, especially if there are other ways to do the same.

Paul Moore

6:54 a.m.

On 14 March 2018 at 13:40, Søren Pilgård <fiskomaten@gmail.com> wrote:

...

Certainly. However, in this case opinions differ (I, like Steven, have no problem with implicit string concatenation, and in certain cases prefer it over using explicit addition and relying on the compiler optimising that away for me). Also, we need to consider the significant body of code that would break if this construct were prohibited. Even if we were to agree that the harm caused by implicit concatenation outweighed the benefits, that would *still* not justify making it illegal, unless the net gain could be shown to justify the cost of forcing every project that currently uses implicit concatenation to change their code, debug those changes, make new releases, etc. Paul.

Søren Pilgård

6:57 a.m.

On Wed, Mar 14, 2018 at 2:54 PM, Paul Moore <p.f.moore@gmail.com> wrote:

...

I guess that is why the proposal is to discourage the use, not removing it.

Facundo Batista

7:22 a.m.

2018-03-14 10:54 GMT-03:00 Paul Moore <p.f.moore@gmail.com>:

...

I never said to prohibit it, just discourage the idiom. -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista

Paul Moore

7:48 a.m.

On 14 March 2018 at 14:22, Facundo Batista <facundobatista@gmail.com> wrote:

...

Apologies. I misread. How would you propose discouraging the idiom? (That may be why I misread - I can't see any practical way of implementing such a "discouragement"). Paul

Chris Angelico

10:57 a.m.

On Thu, Mar 15, 2018 at 12:40 AM, Søren Pilgård <fiskomaten@gmail.com> wrote:

...

You're creating a list. Put a comma at the end of every line; problem solved. Your edit would be from this: foo(["a", "b", "c", ]).bar() to this: foo(["a", "b", "c", "d", ]).bar() and there is no bug. In fact, EVERY time you lay out a list display vertically (or dict or set or any equivalent), ALWAYS put commas. That way, you can reorder the lines freely (you don't special-case the last one), you can append a line without changing the previous one (no noise in the diff), etc, etc, etc. ChrisA

Rob Cliffe

8:58 p.m.

On 14/03/2018 17:57, Chris Angelico wrote:

...

My thoughts exactly. I make it a personal rule to ALWAYS add a comma to every line, including the last one, in this kind of list (/dict/set etc.). *Python allows it - take advantage of it!* (A perhaps minor-seeming feature of the language which actually is a big benefit if you use it.) Preferably with all the commas vertically aligned to highlight the structure (I'm a great believer BTW in using vertical alignment, even if it means violating Python near-taboos such as more that one statement per line). Also I would automatically put the first string (as well as the last) on a line by itself: foo([ "short string" , "extremely looooooooooooong string" , "some other string" , ]) Then as Chris says (sorry to keep banging the drum), the lines can trivially be reordered, and adding more lines never causes a problem as long as I stick to the rule. Which I do automatically because I think my code looks prettier that way. From a purist angle, implicit string concatenation is somewhat inelegant (where else in Python can you have two adjacent operands not separated by an operator/comma/whatever? We don't use reverse Polish notation). And I could live without it. But I have found it useful from time to time: constructing SQL queries or error messages or other long strings that I needed for some reason, where triple quotes would be more awkward (and I find line continuation backslashes ugly, *especially* in mid-string). I guess my attitude is: "If you want to read my Python code, 90%+ of the time it will be *obvious* that these strings are *meant* to be concatenated. But if it isn't, then you need to learn some Python basics first (Sorry!).". I have never as far as I can remember had a bug caused by string concatenation. However, it is possible that I am guilty of selective memory. +1, though, for linters to point out possible errors from possible accidental omission of a comma (if they don't already). It never hurts to have our code checked. Best wishes Rob Cliffe

Matt Arcidy

8:47 a.m.

On Thu, Mar 15, 2018 at 8:58 PM, Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:

...

On 14/03/2018 17:57, Chris Angelico wrote:

On Thu, Mar 15, 2018 at 12:40 AM, Søren Pilgård <fiskomaten@gmail.com> wrote:

Of course you can always make error, even in a single letter. But I think there is a big difference between mixing up +-/* and ** where the operator is in "focus" and the implicit concatenation where there is no operator. A common problem is that you have something like foo(["a", "b", "c" ]).bar() but then you remember that there also needs to be a "d" so you just add foo(["a", "b", "c" "d" ]).bar() Causing an error, not with the "d" expression you are working on but due to what you thought was the previous expression but python turns it into one. The , is seen as a delimiter by the programmer not as part of the operation (or the lack of the ,).

You're creating a list. Put a comma at the end of every line; problem solved. Your edit would be from this:

foo(["a", "b", "c", ]).bar()

to this:

foo(["a", "b", "c", "d", ]).bar()

and there is no bug. In fact, EVERY time you lay out a list display vertically (or dict or set or any equivalent), ALWAYS put commas. That way, you can reorder the lines freely (you don't special-case the last one), you can append a line without changing the previous one (no noise in the diff), etc, etc, etc.

My thoughts exactly. I make it a personal rule to ALWAYS add a comma to every line, including the last one, in this kind of list (/dict/set etc.). Python allows it - take advantage of it! (A perhaps minor-seeming feature of the language which actually is a big benefit if you use it.) Preferably with all the commas vertically aligned to highlight the structure (I'm a great believer BTW in using vertical alignment, even if it means violating Python near-taboos such as more that one statement per line). Also I would automatically put the first string (as well as the last) on a line by itself: foo([ "short string" , "extremely looooooooooooong string" , "some other string" , ]) Then as Chris says (sorry to keep banging the drum), the lines can trivially be reordered, and adding more lines never causes a problem as long as I stick to the rule. Which I do automatically because I think my code looks prettier that way.

From a purist angle, implicit string concatenation is somewhat inelegant (where else in Python can you have two adjacent operands not separated by an operator/comma/whatever? We don't use reverse Polish notation). And I could live without it. But I have found it useful from time to time: constructing SQL queries or error messages or other long strings that I needed for some reason, where triple quotes would be more awkward (and I find line continuation backslashes ugly, *especially* in mid-string). I guess my attitude is: "If you want to read my Python code, 90%+ of the time it will be *obvious* that these strings are *meant* to be concatenated. But if it isn't, then you need to learn some Python basics first (Sorry!).".

I have never as far as I can remember had a bug caused by string concatenation. However, it is possible that I am guilty of selective memory.

+1, though, for linters to point out possible errors from possible accidental omission of a comma (if they don't already). It never hurts to have our code checked.

The linters I know use parsed ast nodes, so if it's not valid grammar, it won't parse. The _linters_ don't check for cases like f(a b) because that's not valid grammar and already caught by the parser. I think that's what you were noting when you said "if they don't already"? As for the issue in general, this is my understanding of linters: The code is loaded immediate into the ast parser. There's a no post-hoc way to know why the node is a string literal. Specifically the nodes for "ab" and "a""b" are identical Str nodes. Reversing from the Str node is impossible (unless a flag/attribute/context/whatever gets added), as the information is destroyed by then. The following might be possible: 1) The line number for a node n1 provides a way select the code between nodes n1 and the next node n2, which contains the offending string. This code needs to be retrieved from somewhere (easily kept from the beginning if not already) 2) A quick reparse of just that chunk will confirm that it contains the target node, so the code retrieval for the target node can be sanity checked (it's not exact retrieval of just the code we want, it's guaranteed to overshoot as resolution is on line numbers.) 3) ? (note below) 4) A quick check shows the tokenizer can differentiate the styles of literal. It gives 2 STRING for 'a''b' and 1 for 'ab'. This is a reliable test if the right code can be found which matches the target string. Hopefully there are others better ways but at least one exists. (better in the sense that the linters i know do not including tokenization currently) 3) The biggest problem for me (hopefully someone just knows the answer) is that some other very reliable parsing is required to know that the literal being _tested_ is the literal we _mean_ to test. I can use ast parser in 2) because im confirming a different piece of information. The problem is that the literal can exist in a very complex section, and coincidentally with 'a''b' and 'ab' in the same expression. The ast node won't tell is if we are looking for 0, 1 or 2 cases of either syntax, we only get 2 Str nodes and 1 line number. I think the parser chosen would pretty much have to be a rewrite of the ast parser due to nesting. Or someone needs to root around in the internals of the ast parser to see if the information can be extracted at the right time. I hope this helps in some way. I don't think it's impossible, but the above will introduce annoying bits into existing linters for this one issue. Of course, a working example would certainly make any case a lot easier. Given there is no technical reason, I agree there's no reason to change anything. This just "feels" ugly to me, so i can tilt at windmills all day on it, but I can see no technical reason. Always amazes me that string are so weird. They are literals and lists, but also multi white space jumping zebras. Are they "multi" "white space" "jumping zebras"? "multi" "white space jumping" "zebras"? "multi white" "space jumping" "zebras? We'll never know! -Matt

...

Oleg Broytman

7:01 a.m.

On Thu, Mar 15, 2018 at 12:15:52AM +1100, Steven D'Aprano <steve@pearwood.info> wrote:

...

We should fix what causes real problems, not convoluted ones. And this particular misfeature caused problems for me.

...

-- Steve

Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Stephan Houben

7:09 a.m.

Note that this has already been proposed and rejected before: https://www.python.org/dev/peps/pep-3126/ Stephan Op 14 mrt. 2018 15:01 schreef "Oleg Broytman" <phd@phdru.name>:

...

Oleg Broytman

7:16 a.m.

On Wed, Mar 14, 2018 at 03:09:28PM +0100, Stephan Houben <stephanh42@gmail.com> wrote:

...

Note that this has already been proposed and rejected before:

https://www.python.org/dev/peps/pep-3126/

Not the same. The current proposal is to discourage, not remove the misfeature.

...

Stephan

Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Facundo Batista

7:23 a.m.

2018-03-14 11:09 GMT-03:00 Stephan Houben <stephanh42@gmail.com>:

...

Note that this has already been proposed and rejected before:

https://www.python.org/dev/peps/pep-3126/

What was proposed and reject was the *removal* of the feature/syntax. I propose the discouragement of the idiom. Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista

Stephan Houben

7:30 a.m.

Op 14 mrt. 2018 15:23 schreef "Facundo Batista" <facundobatista@gmail.com>: I propose the discouragement of the idiom. What does that mean? Stephan Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista

Robert Vanden Eynde

7:51 a.m.

+1 on general idea of discouraging it.

...

[...] mistakes like:

fruits = { "apple", "orange" "banana", "melon", }

...

(and even making the static analysers, like pyflakes or pylint, to show that as a warning)

...

I agree that implicit concatenation is a bad feature. I have seen it cause multiple stupid errors while I have never actually used it for anything useful. [...] Zen of Python.

+1. But I think a counter argument would be a real world example.

...

We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

Maybe the "+" at the beginnig as PEP8 said, to clearly mark it present and explicit. long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together") or long_string = ("" + "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

...

People can make all sorts of mistakes through carlessness. I wrote {y, x*3} the other day instead of {y: x**3}

Other comments said this isn't really the same kind of error, I agree to them.

...

...
Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

...

That is an implementation feature, not a language requirement. Not all Python interpreters will do that, and they are free to put limits on how much they optimize. Here's Python 3.5:

...

With string concatentation having potential O(N**2) performance, if you're writing explicit string concatenations with +, they could potentially be *very* expensive at runtime.

I completely agree, by emitting warning if this syntax is used, codes Could be less efficient, but that's also the job of the implementations to be fast. This is the only counter-argument I have to OP, even if I really like the proposal.

...

Life is too short to learn *everything*, I know that, and I've certainly copied lots of code I don't understand (and hoped I'd never need to debug it!). If that makes me a cargo-cult coder too, so be it.

I find that learning `x = "e" "f"` does indeed compile is also learning and surprising (especially for beginners).

...

While I personally dislike implicit string concatenation, I've worked in projects that use it, and I, more often than I'd like, use them myself.

We would like to see more real world example vs real world mistakes.

...

It is a bit cumbersome to do something like from textwrap import dedent as D a = D("""\ My code-aligned, multiline text with no spaces to the left goes here """)

+1. But you should add "\n" to have a = ("hello\n" "world\n" "how") Which would be maybe better written as a = '\n'.join([ "hello", "world", "how"]) (put the spaces where you like) which can easily be switched to ''.join if needed.

...

A common problem is that you have something like foo(["a", "b", "c" ]).bar() but then you remember that there also needs to be a "d" so you just add

...

Yes of course the skilled programmer needs to understand the difference. But I am talking about absolute beginners where python in many regards is an excelent first language.

Agreed. 2018-03-14 15:30 GMT+01:00 Stephan Houben <stephanh42@gmail.com>:

...

Op 14 mrt. 2018 15:23 schreef "Facundo Batista" <facundobatista@gmail.com>:

I propose the discouragement of the idiom.

What does that mean?

Stephan

Regards,

-- . Facundo

Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Facundo Batista

9:47 a.m.

2018-03-14 11:30 GMT-03:00 Stephan Houben <stephanh42@gmail.com>:

...

That we say "hey, this works but please don't use it, because it tends to a error prone way of writing some code, instead do this". -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista

M.-A. Lemburg

9:59 a.m.

Hi Facundo, On 14.03.2018 17:47, Facundo Batista wrote:

...

I believe this falls under coding style and is not something we should actively discourage. It's an idiom that many programming languages share with Python. You may also want to look at this PEP for a longer discussion on the topic: https://www.python.org/dev/peps/pep-3126/ Even pylint rejected a request to have a rule added for this: https://github.com/PyCQA/pylint/issues/1589 FWIW: I use implicit concats a lot and find them very useful for e.g. writing long SQL queries or longer string literals at indented levels where the triple quote approach doesn't work well. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 14 2018)

...

::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Steven D'Aprano

3:20 p.m.

On Wed, Mar 14, 2018 at 01:47:53PM -0300, Facundo Batista wrote:

...

It's not error prone. It works perfectly. The error you are complaining about has absolutely nothing to do with the intentional use of implicit string concatenation. The error you describe comes from accidentally leaving out a comma. Suppose I follow this advice. I spend a month going through all my code, removing every example of implicit string concatenation and replace it with some work-around like runtime string concatenation or calling textwrap.dedent() on triple quoted strings. And I swear to never, ever use implicit string concatenation again. And the very next day, I accidently leave out a comma in a list of strings. How exactly does it help me to follow this advice? All it does is give me a lot of extra work to do, by avoiding a useful and reliable feature, without preventing the error you are concerned about. -- Steve

Greg Ewing

3:37 p.m.

Facundo Batista wrote:

...

That we say "hey, this works but please don't use it, because it tends to a error prone way of writing some code, instead do this".

The trouble is that there is nowhere near widespread agreement that it is error prone enough to be worth such a drastic measure as deprecating the whole feature. -- Greg

Greg Ewing

3:04 p.m.

Facundo Batista wrote:

...

What was proposed and reject was the *removal* of the feature/syntax.

I propose the discouragement of the idiom.

Providing a linter option to catch it would be fine. Officially discouraging it would be going too far, I think. It's one thing to promote style guidelines concerning how to use or not use certain language features. But this would be telling people they should *never* use an entire feature -- effectively deprecating it even if there is no intention to remove it -- for very subjective reasons. That comes across as excessively moralistic to me. I object to someone telling me what I should or shouldn't do in the privacy of my own code. -- Greg

Robert Vanden Eynde

7:09 a.m.

+1 on general idea of discouraging it.

...

[...] mistakes like:

fruits = { "apple", "orange" "banana", "melon", }

...

(and even making the static analysers, like pyflakes or pylint, to show that as a warning)

...

I agree that implicit concatenation is a bad feature. I have seen it cause multiple stupid errors while I have never actually used it for anything useful. [...] Zen of Python.

+1. But I think a counter argument would be a real world example.

...

We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

...

People can make all sorts of mistakes through carlessness. I wrote {y, x*3} the other day instead of {y: x**3}

Other comments said this isn't really the same kind of error, I agree to them.

...

...
Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

...

That is an implementation feature, not a language requirement. Not all Python interpreters will do that, and they are free to put limits on how much they optimize. Here's Python 3.5:

...

With string concatentation having potential O(N**2) performance, if you're writing explicit string concatenations with +, they could potentially be *very* expensive at runtime.

...

Life is too short to learn *everything*, I know that, and I've certainly copied lots of code I don't understand (and hoped I'd never need to debug it!). If that makes me a cargo-cult coder too, so be it.

I find that learning `x = "e" "f"` does indeed compile is also learning and surprising (especially for beginners).

...

While I personally dislike implicit string concatenation, I've worked in projects that use it, and I, more often than I'd like, use them myself.

We would like to see more real world example vs real world mistakes.

...

It is a bit cumbersome to do something like from textwrap import dedent as D a = D("""\ My code-aligned, multiline text with no spaces to the left goes here """)

...

A common problem is that you have something like foo(["a", "b", "c" ]).bar() but then you remember that there also needs to be a "d" so you just add

...

Yes of course the skilled programmer needs to understand the difference. But I am talking about absolute beginners where python in many regards is an excelent first language.

Agreed.

Elazar

11:59 a.m.

On Wed, Mar 14, 2018 at 8:43 PM Robert Vanden Eynde < robertvandeneynde@hotmail.com> wrote:

...

Or allow unary plus: long_string = ( + "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together" ) Elazar

Steven D'Aprano

8:33 a.m.

On Wed, Mar 14, 2018 at 03:01:07PM +0100, Oleg Broytman wrote:

...

I don't think that using a single asterisk instead of two is a "convoluted" problem. Please Oleg, if you want others to respect your personal experiences, you should return the favour and respect theirs. Don't try to dismiss them as "convoluted" or not "real". I am far more likely to mistype exponentiation with a single asterisk than to accidentally leave out a comma from a list display of strings. But I recognise that this is *my* problem, not the language. You want some sort of "official" recommendation not to use implicit string concatenation. What benefit will that give you, except to nag those who like and use the feature? - It won't give you a error or warning when your code accidentally uses string concatenation; you still need to be alert for the possibility, or use a linter. - It won't teach people not to use this feature; you still need to point them at the recommendation and have them read it and agree to follow it. - It won't automatically remove string concatenation from the code you use, nor prevent it from happening again, or even change your project style-guide. - It won't even add any information to the community that doesn't already exist. I have no doubt there are many blog posts and Stackoverflow discussions about this. I know I've talked about it many times on comp.lang.python. If you know how to google, you can find a post from Guido talking about getting "a mysterious argument count error" due to this. If you want something to point people at, bookmark Guido's comment. But for that matter, what will this statement say? "Don't use implicit concatenation. Even though this is not deprecated and will remain legal, you mustn't use it because you might use it in a list or function call and forget to use a comma between two string literals and get a mysterious error." Um, yeah, okay, I get lots of mysterious errors caused by carelessness. Some of them are easy to fix. Some are much harder to track down and fix than a missing comma. Do they all need official warnings in the docs? "Don't misspell None as 'none' in three dozen places. And if you do, don't spend twenty minutes fixing them one at a time, use your editor's search and replace function." (That really was not my best day.) I don't think this is any worse or more common than any other gotcha or silly error that might trip people up occasionally, and I think that singling it out in the Python docs for special warning is unnecessary. -- Steve

Serhiy Storchaka

8:45 a.m.

14.03.18 17:33, Steven D'Aprano пише:

...

And don't replace the plane "none", but use regular expressions to avoid replacing "none" in "allow_none" and "nonexisting"! (And there should be a warning about replacing the word "none" in comments and strings).

Oleg Broytman

6:31 a.m.

Hi! On Wed, Mar 14, 2018 at 09:18:30AM -0300, Facundo Batista <facundobatista@gmail.com> wrote:

...

I am all for that, with both hands! +1000000, dammit! Once I stumbled over a bug caused by this in legacy code in production. Fixing it was quite painful! The problem with the idiom is it's too easy to make the mistake and rather hard to find.

...

Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Steven D'Aprano

7:18 a.m.

On Wed, Mar 14, 2018 at 02:31:58PM +0100, Oleg Broytman wrote: [...]

...

Did you mean that *finding* the bug was painful? Fixing it should be trivially easy: add a comma. But honestly, I'm having trouble seeing why it would be painful to find the bug. As soon as you see "orangebanana" where you expected "orange" or "banana", that should give you a very strong clue. But without knowing the details of your code, and how convoluted the problem was, I can't really judge. I can only say that in *my personal experience* finding and fixing these sorts of errors is usually simple.

...

The problem with the idiom is it's too easy to make the mistake and rather hard to find.

I disagree that its easy to make or hard to find, but I accept that not everyone will have the same experience as me. For those who have this problem, perhaps you should use a linter to check your code and warn about string concatenation. -- Steve

Oleg Broytman

7:52 a.m.

On Thu, Mar 15, 2018 at 01:18:06AM +1100, Steven D'Aprano <steve@pearwood.info> wrote:

...

The most painful was that the bug destroyed important information that was hard to fix later (enterprise code in production, the code is certificated and hard to replace). The second pain was to understand what was going on. When a program report "The OID in the certificate doesn't belong to the proper root OID" it's simpler to decide that the certificate is wrong, not that the list of root OIDs to check is wrong. The third pain was to find the bug. It's hard to spot a problem in the list oids = [ "1.2.6.254.14." "19.9.91.281", "1.2.6.263.12." "481.7.9.6.3.87" "1.2.7.4.214.7." "9.1.52.12", ] There were about 50 OIDs like these, and the missing comma was in the middle of the list.

...

-- Steve

Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Guido van Rossum

8:03 a.m.

I use the feature regularly for long error messages, and when combined with .format() or % it's not so easy to replace it with a + (it would require adding parentheses). So I am against formally discouraging it or adding something to PEP 8. However linters could warn about lists of comma-separated strings with a missing comma. On Wed, Mar 14, 2018, 07:54 Oleg Broytman <phd@phdru.name> wrote:

...

Robert Vanden Eynde

8:10 a.m.

Indeed, linters are the place to go, but I think there is no "official" linter (am I wrong ?), are pyflakes and pylint independant projects ? 2018-03-14 16:03 GMT+01:00 Guido van Rossum <gvanrossum@gmail.com>:

...

I use the feature regularly for long error messages, and when combined with .format() or % it's not so easy to replace it with a + (it would require adding parentheses).

So I am against formally discouraging it or adding something to PEP 8.

However linters could warn about lists of comma-separated strings with a missing comma.

On Wed, Mar 14, 2018, 07:54 Oleg Broytman <phd@phdru.name> wrote:

...
On Thu, Mar 15, 2018 at 01:18:06AM +1100, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Wed, Mar 14, 2018 at 02:31:58PM +0100, Oleg Broytman wrote:

...
Once I stumbled over a bug caused by this in legacy code in production. Fixing it was quite painful!

Did you mean that *finding* the bug was painful? Fixing it should be trivially easy: add a comma.

The most painful was that the bug destroyed important information that was hard to fix later (enterprise code in production, the code is certificated and hard to replace).

The second pain was to understand what was going on. When a program report "The OID in the certificate doesn't belong to the proper root OID" it's simpler to decide that the certificate is wrong, not that the list of root OIDs to check is wrong.

The third pain was to find the bug. It's hard to spot a problem in the list

oids = [ "1.2.6.254.14." "19.9.91.281", "1.2.6.263.12." "481.7.9.6.3.87" "1.2.7.4.214.7." "9.1.52.12", ]

There were about 50 OIDs like these, and the missing comma was in the middle of the list.

...
-- Steve

Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Paul Moore

9:29 a.m.

On 14 March 2018 at 15:10, Robert Vanden Eynde <robertvandeneynde@hotmail.com> wrote:

...

Ironically, many of the places I see implicit concatenation used are where people need to work around linters complaining about line lengths. I understand the benefits of projects that mandate code passing lint checks, but I foresee sequences of commits: "Modify error X to be clearer" "Linter fix - split new message onto 2 lines to avoid long line" "Linter fix - explicitly concatenate with + because implicit concatenation is discouraged" "Linter fix - go back to the old message because the stupid linter complains about unnecessary addition of 2 constant values and I can't be bothered any more" Ok, the last one is a joke - but getting a set of rules that doesn't back the programmer into a corner where it's non-obvious how to achieve a simple goal could easily become a non-trivial task. Paul

Robert Vanden Eynde

10:09 a.m.

Le mer. 14 mars 2018 à 17:29, Paul Moore <p.f.moore@gmail.com> a écrit :

...

Haha, agreed, didn't think of that. In my humble opinion, long lines are less of a problem with raw strings (personal opinion based on html background).

Carl Meyer

9:53 a.m.

On 3/14/18 8:03 AM, Guido van Rossum wrote:

...

+1 to all of this. It's possible to write a linter rule that will reliably catch the potential missing-comma errors, while still allowing implicit string concatenation where it's useful and readable. The rule is that implicit string concatenation should only be disallowed if there are any commas present at the same "parenthesization level". Thus, this is allowed: raise SomeException( "This is a really long exception message " "which I've split over two lines." ) While both of these are disallowed: fruits = [ 'orange', 'grape' 'banana', ] some_str.replace( "This is a long string " "split over two lines.", "And this is another string." ) For the latter case, you'd need to wrap the multiline string in its own parens, use +, and/or previously assign it to its own variable. (I don't think this should be controversial; the example is just asking for trouble.) With this rule the only missing-comma that can slip through is when you've forgotten _all_ the intervening commas in your sequence of strings. That's much less likely. Carl

Robert Vanden Eynde

10:13 a.m.

Le mer. 14 mars 2018 à 18:04, Carl Meyer <carl@oddbird.net> a écrit :

...

+1... if this is indeed implemented in a linter !

Greg Ewing

3:40 p.m.

Carl Meyer wrote:

...

Not so unlikely when the argument list is of length 2. -- Greg

Bar Harel

4:37 p.m.

I'm with Steven D`Aprano here. Implicit string concat, in my experience, is something clear, concise and convenient. I've had a bug because of it only once or twice during my whole career, and not only that "occasional" bug would have occurred either way because of a missing comma or plus sign, but it saves the trouble of having to add more symbols which usually means less surface area for bugs. If you feel it affects you all so often, I would totally understand the addition of that feature in a linter, but not endorsed by the python community in any way, and with defaults probably set to disabled. I wouldn't want to add any superfluous symbols to my otherwise (at least in my opinion) cleaner code. All in all, -1 from me. On Thu, Mar 15, 2018 at 12:41 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Antoine Pitrou

10:41 a.m.

On Wed, 14 Mar 2018 15:03:02 +0000 Guido van Rossum <gvanrossum@gmail.com> wrote:

...

Same here. Implicit string concatenation together with string formatting is quite common for me when writing error messages. Regards Antoine.

Terry Reedy

1:11 p.m.

On 3/14/2018 11:03 AM, Guido van Rossum wrote:

...

I like using f-strings for this now, and adding '+' stops compile-time concatenation across literal boundaries. See my response to FB's original post. -- Terry Jan Reedy

Greg Ewing

3:22 p.m.

Oleg Broytman wrote:

...

Seems to me there are a number of problems right there: * Important information that is not backed up or is hard to recover from wherever it is backed up * A rigid deployment process without correspondingly thorough testing of what is to be deployed Blaming the debacle on one particular language feature, when something similar could have happened for any number of other reasons, seems like scapegoat-hunting.

...

Again, testing. You're about to deploy a hard-coded list of IDs to a mission-critical environment where it will be very difficult to change anything later. Might it perhaps be prudent to do something like checking that they're all valid? A typo in one of the IDs would have given you the same result, without involving any language features of dubious merit. -- Greg

Serhiy Storchaka

8:38 a.m.

14.03.18 14:18, Facundo Batista пише:

...

This already was discussed 5 years ago. See the topic "Implicit string literal concatenation considered harmful?" started by GvR. https://mail.python.org/pipermail/python-ideas/2013-May/020527.html First that reviving this discussion please take a look at arguments made at former discussion, and make sure that your arguments are new.

Rhodri James

9:03 a.m.

On 14/03/18 12:18, Facundo Batista wrote:

...

-1. I use it a fair bit, and prefer it to explicit concatenation. But then I come from C, so I'm used to it.

...

Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

Is that assertion true? In all Python compilers? I would expect the constant string implicit concatenation to produce single constants efficiently, but explicit concatenation doing the constant folding is a much less sure bet. -- Rhodri James *-* Kynesim Ltd

Terry Reedy

1:09 p.m.

On 3/14/2018 8:18 AM, Facundo Batista wrote:

...

Hello!

What would you think about formally descouraging the following idiom?

long_string = ( "some part of the string " "with more words, actually is the same " "string that the compiler puts together")

-1

...

We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

...

Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

Not always, as Eryk Sun discussed. When f-strings are thrown into the mix, as when creating multi-line messages with interpolated values, not even true in 3.7. Compare the following:

...

...
...
def f(): x = input('x:') print(f'a{x}bc' 'def')

...

...
...
dis.dis(f) 2 0 LOAD_GLOBAL 0 (input) 2 LOAD_CONST 1 ('x:') 4 CALL_FUNCTION 1 6 STORE_FAST 0 (x)

3 8 LOAD_GLOBAL 1 (print) 10 LOAD_CONST 2 ('a') 12 LOAD_FAST 0 (x) 14 FORMAT_VALUE 0 16 LOAD_CONST 3 ('bcdef') 18 BUILD_STRING 3 20 CALL_FUNCTION 1 22 POP_TOP 24 LOAD_CONST 0 (None) 26 RETURN_VALUE

...

...
...
def f(): x = input('x:') print(f'a{x}bc' + 'def')

...

...
...
dis.dis(f) 2 0 LOAD_GLOBAL 0 (input) 2 LOAD_CONST 1 ('x:') 4 CALL_FUNCTION 1 6 STORE_FAST 0 (x)

3 8 LOAD_GLOBAL 1 (print) 10 LOAD_CONST 2 ('a') 12 LOAD_FAST 0 (x) 14 FORMAT_VALUE 0 16 LOAD_CONST 3 ('bc') 18 BUILD_STRING 3 20 LOAD_CONST 4 ('def') 22 BINARY_ADD 24 CALL_FUNCTION 1 26 POP_TOP 28 LOAD_CONST 0 (None) 30 RETURN_VALUE The '+' prevents compile time concatenation of 'bc' and 'def' and adds a load and binary add. -- Terry Jan Reedy

Kyle Lahnakoski

3:06 p.m.

On 2018-03-14 08:18, Facundo Batista wrote:

...

Thank you for bring this up! A regex through one of my programs revealed three bugs caused by implicit concatenation in lists. Here is one of them: https://github.com/klahnakoski/ActiveData-ETL/blob/etl/activedata_etl/transf... The missing commas were not caught until now because of the lists are long, and deal with rare cases. I did use implicit concatenation for a long SQL statement, and a few long error messages, but the byte savings is not worth the increased bug density.

...

is almost identical to

...

Chris Angelico

3:31 p.m.

On Thu, Mar 15, 2018 at 9:06 AM, Kyle Lahnakoski <klahnakoski@mozilla.com> wrote:

...

Even better: self.db.execute(""" CREATE TABLE files ( bucket TEXT, key TEXT, name TEXT, last_modified REAL, size INTEGER, annotate TEXT, CONSTRAINT pk PRIMARY KEY (bucket, name) ) """) ChrisA

Søren Pilgård

March 2018

12:43 p.m.

On Wed, Mar 14, 2018 at 1:18 PM, Facundo Batista <facundobatista@gmail.com> wrote:

...

Steven D'Aprano

1:26 p.m.

On Wed, Mar 14, 2018 at 01:43:53PM +0100, Søren Pilgård wrote: [...]

...

Søren Pilgård

1:48 p.m.

On Wed, Mar 14, 2018 at 2:26 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

eryk sun

12:56 p.m.

On Wed, Mar 14, 2018 at 12:18 PM, Facundo Batista <facundobatista@gmail.com> wrote:

...

Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

Steven D'Aprano

1:15 p.m.

On Wed, Mar 14, 2018 at 09:18:30AM -0300, Facundo Batista wrote:

...

Hello!

What would you think about formally descouraging the following idiom?

long_string = ( "some part of the string " "with more words, actually is the same " "string that the compiler puts together")

I would hate that.

...

We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

Should we? I disagree. Of course you're welcome to specify that in your own style-guide for your own code, but I won't be following that recommendation.

...

I know that "no change to Python itself" is needed, but having a formal discouragement of the idiom will help in avoiding people to fall in mistakes like:

fruits = { "apple", "orange" "banana", "melon", }

...

Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

Joao S. O. Bueno

1:32 p.m.

...

On Wed, Mar 14, 2018 at 09:18:30AM -0300, Facundo Batista wrote:

...
Hello!

What would you think about formally descouraging the following idiom?

long_string = ( "some part of the string " "with more words, actually is the same " "string that the compiler puts together")

I would hate that.

...
We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

Should we? I disagree.

Of course you're welcome to specify that in your own style-guide for your own code, but I won't be following that recommendation.

...
I know that "no change to Python itself" is needed, but having a formal discouragement of the idiom will help in avoiding people to fall in mistakes like:

fruits = { "apple", "orange" "banana", "melon", }

People can make all sorts of mistakes through carlessness. I wrote

{y, x*3}

the other day instead of {y: x**3}. (That's *two* errors in one simple expression. I wish I could say it was a record for me.) Should we "discourage" exponentiation and dict displays and insist on writing dict((y, x*x*x)) to avoid the risk of errors? I don't think so.

I think string concatenation falls into the same category. Sometimes even the most careful writer makes a mistake (let alone careless writers). That's life. Not every problem needs a technical "solution". Sometimes the right solution is to proof-read your code, or have code review by a fresh pair of eyes.

And tests, of course.

[...]

...
Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

That is an implementation feature, not a language requirement. Not all Python interpreters will do that, and they are free to put limits on how much they optimize. Here's Python 3.5:

py> import dis py> dis.dis('s = "abcdefghijklmnopqrs" + "t"') 1 0 LOAD_CONST 3 ('abcdefghijklmnopqrst') 3 STORE_NAME 0 (s) 6 LOAD_CONST 2 (None) 9 RETURN_VALUE

But now see this:

py> dis.dis('s = "abcdefghijklmnopqrs" + "tu"') 1 0 LOAD_CONST 0 ('abcdefghijklmnopqrs') 3 LOAD_CONST 1 ('tu') 6 BINARY_ADD 7 STORE_NAME 0 (s) 10 LOAD_CONST 2 (None) 13 RETURN_VALUE

And older versions of CPython didn't optimize this at all, and some day there could be a command-line switch or environment variable to turn these optimizations off.

With string concatentation having potential O(N**2) performance, if you're writing explicit string concatenations with +, they could potentially be *very* expensive at runtime.

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Søren Pilgård

March 2018

6:40 a.m.

On Wed, Mar 14, 2018 at 2:15 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Wed, Mar 14, 2018 at 09:18:30AM -0300, Facundo Batista wrote:

...
Hello!

What would you think about formally descouraging the following idiom?

long_string = ( "some part of the string " "with more words, actually is the same " "string that the compiler puts together")

I would hate that.

...
We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

Should we? I disagree.

Of course you're welcome to specify that in your own style-guide for your own code, but I won't be following that recommendation.

...
I know that "no change to Python itself" is needed, but having a formal discouragement of the idiom will help in avoiding people to fall in mistakes like:

fruits = { "apple", "orange" "banana", "melon", }

People can make all sorts of mistakes through carlessness. I wrote

{y, x*3}

the other day instead of {y: x**3}. (That's *two* errors in one simple expression. I wish I could say it was a record for me.) Should we "discourage" exponentiation and dict displays and insist on writing dict((y, x*x*x)) to avoid the risk of errors? I don't think so.

I think string concatenation falls into the same category. Sometimes even the most careful writer makes a mistake (let alone careless writers). That's life. Not every problem needs a technical "solution". Sometimes the right solution is to proof-read your code, or have code review by a fresh pair of eyes.

And tests, of course.

[...]

...
Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

That is an implementation feature, not a language requirement. Not all Python interpreters will do that, and they are free to put limits on how much they optimize. Here's Python 3.5:

py> import dis py> dis.dis('s = "abcdefghijklmnopqrs" + "t"') 1 0 LOAD_CONST 3 ('abcdefghijklmnopqrst') 3 STORE_NAME 0 (s) 6 LOAD_CONST 2 (None) 9 RETURN_VALUE

But now see this:

py> dis.dis('s = "abcdefghijklmnopqrs" + "tu"') 1 0 LOAD_CONST 0 ('abcdefghijklmnopqrs') 3 LOAD_CONST 1 ('tu') 6 BINARY_ADD 7 STORE_NAME 0 (s) 10 LOAD_CONST 2 (None) 13 RETURN_VALUE

And older versions of CPython didn't optimize this at all, and some day there could be a command-line switch or environment variable to turn these optimizations off.

With string concatentation having potential O(N**2) performance, if you're writing explicit string concatenations with +, they could potentially be *very* expensive at runtime.

Paul Moore

6:54 a.m.

On 14 March 2018 at 13:40, Søren Pilgård <fiskomaten@gmail.com> wrote:

...

Søren Pilgård

6:57 a.m.

On Wed, Mar 14, 2018 at 2:54 PM, Paul Moore <p.f.moore@gmail.com> wrote:

...

I guess that is why the proposal is to discourage the use, not removing it.

Facundo Batista

7:22 a.m.

2018-03-14 10:54 GMT-03:00 Paul Moore <p.f.moore@gmail.com>:

...

I never said to prohibit it, just discourage the idiom. -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista

Paul Moore

7:48 a.m.

On 14 March 2018 at 14:22, Facundo Batista <facundobatista@gmail.com> wrote:

...

Apologies. I misread. How would you propose discouraging the idiom? (That may be why I misread - I can't see any practical way of implementing such a "discouragement"). Paul

Chris Angelico

10:57 a.m.

On Thu, Mar 15, 2018 at 12:40 AM, Søren Pilgård <fiskomaten@gmail.com> wrote:

...

Rob Cliffe

March 2018

3:58 a.m.

On 14/03/2018 17:57, Chris Angelico wrote:

...

Matt Arcidy

3:47 p.m.

On Thu, Mar 15, 2018 at 8:58 PM, Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:

...

On 14/03/2018 17:57, Chris Angelico wrote:

On Thu, Mar 15, 2018 at 12:40 AM, Søren Pilgård <fiskomaten@gmail.com> wrote:

Of course you can always make error, even in a single letter. But I think there is a big difference between mixing up +-/* and ** where the operator is in "focus" and the implicit concatenation where there is no operator. A common problem is that you have something like foo(["a", "b", "c" ]).bar() but then you remember that there also needs to be a "d" so you just add foo(["a", "b", "c" "d" ]).bar() Causing an error, not with the "d" expression you are working on but due to what you thought was the previous expression but python turns it into one. The , is seen as a delimiter by the programmer not as part of the operation (or the lack of the ,).

You're creating a list. Put a comma at the end of every line; problem solved. Your edit would be from this:

foo(["a", "b", "c", ]).bar()

to this:

foo(["a", "b", "c", "d", ]).bar()

and there is no bug. In fact, EVERY time you lay out a list display vertically (or dict or set or any equivalent), ALWAYS put commas. That way, you can reorder the lines freely (you don't special-case the last one), you can append a line without changing the previous one (no noise in the diff), etc, etc, etc.

My thoughts exactly. I make it a personal rule to ALWAYS add a comma to every line, including the last one, in this kind of list (/dict/set etc.). Python allows it - take advantage of it! (A perhaps minor-seeming feature of the language which actually is a big benefit if you use it.) Preferably with all the commas vertically aligned to highlight the structure (I'm a great believer BTW in using vertical alignment, even if it means violating Python near-taboos such as more that one statement per line). Also I would automatically put the first string (as well as the last) on a line by itself: foo([ "short string" , "extremely looooooooooooong string" , "some other string" , ]) Then as Chris says (sorry to keep banging the drum), the lines can trivially be reordered, and adding more lines never causes a problem as long as I stick to the rule. Which I do automatically because I think my code looks prettier that way.

From a purist angle, implicit string concatenation is somewhat inelegant (where else in Python can you have two adjacent operands not separated by an operator/comma/whatever? We don't use reverse Polish notation). And I could live without it. But I have found it useful from time to time: constructing SQL queries or error messages or other long strings that I needed for some reason, where triple quotes would be more awkward (and I find line continuation backslashes ugly, *especially* in mid-string). I guess my attitude is: "If you want to read my Python code, 90%+ of the time it will be *obvious* that these strings are *meant* to be concatenated. But if it isn't, then you need to learn some Python basics first (Sorry!).".

I have never as far as I can remember had a bug caused by string concatenation. However, it is possible that I am guilty of selective memory.

+1, though, for linters to point out possible errors from possible accidental omission of a comma (if they don't already). It never hurts to have our code checked.

...

Oleg Broytman

2:01 p.m.

On Thu, Mar 15, 2018 at 12:15:52AM +1100, Steven D'Aprano <steve@pearwood.info> wrote:

...

We should fix what causes real problems, not convoluted ones. And this particular misfeature caused problems for me.

...

-- Steve

Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Stephan Houben

2:09 p.m.

Note that this has already been proposed and rejected before: https://www.python.org/dev/peps/pep-3126/ Stephan Op 14 mrt. 2018 15:01 schreef "Oleg Broytman" <phd@phdru.name>:

...

Oleg Broytman

2:16 p.m.

On Wed, Mar 14, 2018 at 03:09:28PM +0100, Stephan Houben <stephanh42@gmail.com> wrote:

...

Note that this has already been proposed and rejected before:

https://www.python.org/dev/peps/pep-3126/

Not the same. The current proposal is to discourage, not remove the misfeature.

...

Stephan

Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Facundo Batista

2:23 p.m.

2018-03-14 11:09 GMT-03:00 Stephan Houben <stephanh42@gmail.com>:

...

Note that this has already been proposed and rejected before:

https://www.python.org/dev/peps/pep-3126/

Stephan Houben

March 2018

2:30 p.m.

Robert Vanden Eynde

2:51 p.m.

+1 on general idea of discouraging it.

...

[...] mistakes like:

fruits = { "apple", "orange" "banana", "melon", }

...

(and even making the static analysers, like pyflakes or pylint, to show that as a warning)

...

I agree that implicit concatenation is a bad feature. I have seen it cause multiple stupid errors while I have never actually used it for anything useful. [...] Zen of Python.

+1. But I think a counter argument would be a real world example.

...

We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

...

People can make all sorts of mistakes through carlessness. I wrote {y, x*3} the other day instead of {y: x**3}

Other comments said this isn't really the same kind of error, I agree to them.

...

...
Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

...

That is an implementation feature, not a language requirement. Not all Python interpreters will do that, and they are free to put limits on how much they optimize. Here's Python 3.5:

...

With string concatentation having potential O(N**2) performance, if you're writing explicit string concatenations with +, they could potentially be *very* expensive at runtime.

...

Life is too short to learn *everything*, I know that, and I've certainly copied lots of code I don't understand (and hoped I'd never need to debug it!). If that makes me a cargo-cult coder too, so be it.

I find that learning `x = "e" "f"` does indeed compile is also learning and surprising (especially for beginners).

...

While I personally dislike implicit string concatenation, I've worked in projects that use it, and I, more often than I'd like, use them myself.

We would like to see more real world example vs real world mistakes.

...

It is a bit cumbersome to do something like from textwrap import dedent as D a = D("""\ My code-aligned, multiline text with no spaces to the left goes here """)

...

A common problem is that you have something like foo(["a", "b", "c" ]).bar() but then you remember that there also needs to be a "d" so you just add

...

Yes of course the skilled programmer needs to understand the difference. But I am talking about absolute beginners where python in many regards is an excelent first language.

Agreed. 2018-03-14 15:30 GMT+01:00 Stephan Houben <stephanh42@gmail.com>:

...

Op 14 mrt. 2018 15:23 schreef "Facundo Batista" <facundobatista@gmail.com>:

I propose the discouragement of the idiom.

What does that mean?

Stephan

Regards,

-- . Facundo

Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Facundo Batista

4:47 p.m.

2018-03-14 11:30 GMT-03:00 Stephan Houben <stephanh42@gmail.com>:

...

M.-A. Lemburg

4:59 p.m.

Hi Facundo, On 14.03.2018 17:47, Facundo Batista wrote:

...

Steven D'Aprano

10:20 p.m.

On Wed, Mar 14, 2018 at 01:47:53PM -0300, Facundo Batista wrote:

...

Greg Ewing

10:37 p.m.

Facundo Batista wrote:

...

That we say "hey, this works but please don't use it, because it tends to a error prone way of writing some code, instead do this".

The trouble is that there is nowhere near widespread agreement that it is error prone enough to be worth such a drastic measure as deprecating the whole feature. -- Greg

Greg Ewing

March 2018

10:04 p.m.

Facundo Batista wrote:

...

What was proposed and reject was the *removal* of the feature/syntax.

I propose the discouragement of the idiom.

Robert Vanden Eynde

2:09 p.m.

+1 on general idea of discouraging it.

...

[...] mistakes like:

fruits = { "apple", "orange" "banana", "melon", }

...

(and even making the static analysers, like pyflakes or pylint, to show that as a warning)

...

I agree that implicit concatenation is a bad feature. I have seen it cause multiple stupid errors while I have never actually used it for anything useful. [...] Zen of Python.

+1. But I think a counter argument would be a real world example.

...

We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

...

People can make all sorts of mistakes through carlessness. I wrote {y, x*3} the other day instead of {y: x**3}

Other comments said this isn't really the same kind of error, I agree to them.

...

...
Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

...

That is an implementation feature, not a language requirement. Not all Python interpreters will do that, and they are free to put limits on how much they optimize. Here's Python 3.5:

...

With string concatentation having potential O(N**2) performance, if you're writing explicit string concatenations with +, they could potentially be *very* expensive at runtime.

...

Life is too short to learn *everything*, I know that, and I've certainly copied lots of code I don't understand (and hoped I'd never need to debug it!). If that makes me a cargo-cult coder too, so be it.

I find that learning `x = "e" "f"` does indeed compile is also learning and surprising (especially for beginners).

...

While I personally dislike implicit string concatenation, I've worked in projects that use it, and I, more often than I'd like, use them myself.

We would like to see more real world example vs real world mistakes.

...

It is a bit cumbersome to do something like from textwrap import dedent as D a = D("""\ My code-aligned, multiline text with no spaces to the left goes here """)

...

A common problem is that you have something like foo(["a", "b", "c" ]).bar() but then you remember that there also needs to be a "d" so you just add

...

Yes of course the skilled programmer needs to understand the difference. But I am talking about absolute beginners where python in many regards is an excelent first language.

Agreed.

Elazar

6:59 p.m.

On Wed, Mar 14, 2018 at 8:43 PM Robert Vanden Eynde < robertvandeneynde@hotmail.com> wrote:

...

Or allow unary plus: long_string = ( + "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together" ) Elazar

Steven D'Aprano

3:33 p.m.

On Wed, Mar 14, 2018 at 03:01:07PM +0100, Oleg Broytman wrote:

...

Serhiy Storchaka

3:45 p.m.

14.03.18 17:33, Steven D'Aprano пише:

...

Oleg Broytman

1:31 p.m.

Hi! On Wed, Mar 14, 2018 at 09:18:30AM -0300, Facundo Batista <facundobatista@gmail.com> wrote:

...

Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Steven D'Aprano

March 2018

2:18 p.m.

On Wed, Mar 14, 2018 at 02:31:58PM +0100, Oleg Broytman wrote: [...]

...

The problem with the idiom is it's too easy to make the mistake and rather hard to find.

Oleg Broytman

2:52 p.m.

On Thu, Mar 15, 2018 at 01:18:06AM +1100, Steven D'Aprano <steve@pearwood.info> wrote:

...

-- Steve

Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Guido van Rossum

3:03 p.m.

...

Robert Vanden Eynde

3:10 p.m.

...

I use the feature regularly for long error messages, and when combined with .format() or % it's not so easy to replace it with a + (it would require adding parentheses).

So I am against formally discouraging it or adding something to PEP 8.

However linters could warn about lists of comma-separated strings with a missing comma.

On Wed, Mar 14, 2018, 07:54 Oleg Broytman <phd@phdru.name> wrote:

...
On Thu, Mar 15, 2018 at 01:18:06AM +1100, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Wed, Mar 14, 2018 at 02:31:58PM +0100, Oleg Broytman wrote:

...
Once I stumbled over a bug caused by this in legacy code in production. Fixing it was quite painful!

Did you mean that *finding* the bug was painful? Fixing it should be trivially easy: add a comma.

The most painful was that the bug destroyed important information that was hard to fix later (enterprise code in production, the code is certificated and hard to replace).

The second pain was to understand what was going on. When a program report "The OID in the certificate doesn't belong to the proper root OID" it's simpler to decide that the certificate is wrong, not that the list of root OIDs to check is wrong.

The third pain was to find the bug. It's hard to spot a problem in the list

oids = [ "1.2.6.254.14." "19.9.91.281", "1.2.6.263.12." "481.7.9.6.3.87" "1.2.7.4.214.7." "9.1.52.12", ]

There were about 50 OIDs like these, and the missing comma was in the middle of the list.

...
-- Steve

Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Paul Moore

4:29 p.m.

On 14 March 2018 at 15:10, Robert Vanden Eynde <robertvandeneynde@hotmail.com> wrote:

...

Robert Vanden Eynde

5:09 p.m.

Le mer. 14 mars 2018 à 17:29, Paul Moore <p.f.moore@gmail.com> a écrit :

...

Haha, agreed, didn't think of that. In my humble opinion, long lines are less of a problem with raw strings (personal opinion based on html background).

Carl Meyer

March 2018

4:53 p.m.

On 3/14/18 8:03 AM, Guido van Rossum wrote:

...

Robert Vanden Eynde

5:13 p.m.

Le mer. 14 mars 2018 à 18:04, Carl Meyer <carl@oddbird.net> a écrit :

...

+1... if this is indeed implemented in a linter !

Greg Ewing

10:40 p.m.

Carl Meyer wrote:

...

Not so unlikely when the argument list is of length 2. -- Greg

Bar Harel

11:37 p.m.

...

Antoine Pitrou

5:41 p.m.

On Wed, 14 Mar 2018 15:03:02 +0000 Guido van Rossum <gvanrossum@gmail.com> wrote:

...

Same here. Implicit string concatenation together with string formatting is quite common for me when writing error messages. Regards Antoine.

Terry Reedy

8:11 p.m.

On 3/14/2018 11:03 AM, Guido van Rossum wrote:

...

I like using f-strings for this now, and adding '+' stops compile-time concatenation across literal boundaries. See my response to FB's original post. -- Terry Jan Reedy

Greg Ewing

March 2018

10:22 p.m.

Oleg Broytman wrote:

...

Serhiy Storchaka

3:38 p.m.

14.03.18 14:18, Facundo Batista пише:

...

Rhodri James

4:03 p.m.

On 14/03/18 12:18, Facundo Batista wrote:

...

-1. I use it a fair bit, and prefer it to explicit concatenation. But then I come from C, so I'm used to it.

...

Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

Terry Reedy

8:09 p.m.

On 3/14/2018 8:18 AM, Facundo Batista wrote:

...

Hello!

What would you think about formally descouraging the following idiom?

long_string = ( "some part of the string " "with more words, actually is the same " "string that the compiler puts together")

-1

...

We should write the following, instead:

long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together")

...

Note that there's no penalty in adding the '+' between the strings, those are resolved at compilation time.

Not always, as Eryk Sun discussed. When f-strings are thrown into the mix, as when creating multi-line messages with interpolated values, not even true in 3.7. Compare the following:

...

...
...
def f(): x = input('x:') print(f'a{x}bc' 'def')

...

...
...
dis.dis(f) 2 0 LOAD_GLOBAL 0 (input) 2 LOAD_CONST 1 ('x:') 4 CALL_FUNCTION 1 6 STORE_FAST 0 (x)

3 8 LOAD_GLOBAL 1 (print) 10 LOAD_CONST 2 ('a') 12 LOAD_FAST 0 (x) 14 FORMAT_VALUE 0 16 LOAD_CONST 3 ('bcdef') 18 BUILD_STRING 3 20 CALL_FUNCTION 1 22 POP_TOP 24 LOAD_CONST 0 (None) 26 RETURN_VALUE

...

...
...
def f(): x = input('x:') print(f'a{x}bc' + 'def')

...

...
...
dis.dis(f) 2 0 LOAD_GLOBAL 0 (input) 2 LOAD_CONST 1 ('x:') 4 CALL_FUNCTION 1 6 STORE_FAST 0 (x)

Kyle Lahnakoski

10:06 p.m.

On 2018-03-14 08:18, Facundo Batista wrote:

...

is almost identical to

...

Chris Angelico

10:31 p.m.

On Thu, Mar 15, 2018 at 9:06 AM, Kyle Lahnakoski <klahnakoski@mozilla.com> wrote:

...

Even better: self.db.execute(""" CREATE TABLE files ( bucket TEXT, key TEXT, name TEXT, last_modified REAL, size INTEGER, annotate TEXT, CONSTRAINT pk PRIMARY KEY (bucket, name) ) """) ChrisA

2560

Age (days ago)

2562

Last active (days ago)

List overview

Download

48 comments

24 participants

participants (24)

Antoine Pitrou
Bar Harel
Carl Meyer
Chris Angelico
Elazar
eryk sun
Facundo Batista
Greg Ewing
Guido van Rossum
Joao S. O. Bueno
Kyle Lahnakoski
M.-A. Lemburg
Matt Arcidy
Oleg Broytman
Paul Moore
Rhodri James
Rob Cliffe
Robert Vanden Eynde
Robert Vanden Eynde
Serhiy Storchaka
Stephan Houben
Steven D'Aprano
Søren Pilgård
Terry Reedy

Descouraging the implicit string concatenation

Søren Pilgård

Søren Pilgård

Søren Pilgård

Søren Pilgård

Stephan Houben

Stephan Houben

Søren Pilgård

Søren Pilgård

Søren Pilgård

Søren Pilgård

Stephan Houben

Stephan Houben

tags

participants (24)