[Python-ideas] Descouraging the implicit string concatenation

Steven D'Aprano steve at pearwood.info
Wed Mar 14 09:15:52 EDT 2018


On Wed, Mar 14, 2018 at 09:18:30AM -0300, Facundo Batista wrote:
> Hello!
> 
> What would you think about formally descouraging the following idiom?
> 
>     long_string = (
>         "some part of the string "
>         "with more words, actually is the same "
>         "string that the compiler puts together")

I would hate that.


> We should write the following, instead:
> 
>     long_string = (
>         "some part of the string " +
>         "with more words, actually is the same " +
>         "string that the compiler puts together")

Should we? I disagree.

Of course you're welcome to specify that in your own style-guide for 
your own code, but I won't be following that recommendation.


> I know that "no change to Python itself" is needed, but having a
> formal discouragement of the idiom will help in avoiding people to
> fall in mistakes like:
> 
> fruits = {
>     "apple",
>     "orange"
>     "banana",
>     "melon",
> }

People can make all sorts of mistakes through carlessness. I wrote

    {y, x*3}

the other day instead of {y: x**3}. (That's *two* errors in one simple
expression. I wish I could say it was a record for me.) Should we
"discourage" exponentiation and dict displays and insist on writing
dict((y, x*x*x)) to avoid the risk of errors? I don't think so.

I think string concatenation falls into the same category. Sometimes 
even the most careful writer makes a mistake (let alone careless 
writers). That's life. Not every problem needs a technical "solution". 
Sometimes the right solution is to proof-read your code, or have code 
review by a fresh pair of eyes.

And tests, of course.


[...]
> Note that there's no penalty in adding the '+' between the strings,
> those are resolved at compilation time.

That is an implementation feature, not a language requirement. Not all
Python interpreters will do that, and they are free to put limits on 
how much they optimize. Here's Python 3.5:

py> import dis
py> dis.dis('s = "abcdefghijklmnopqrs" + "t"')
  1           0 LOAD_CONST               3 ('abcdefghijklmnopqrst')
              3 STORE_NAME               0 (s)
              6 LOAD_CONST               2 (None)
              9 RETURN_VALUE

But now see this:

py> dis.dis('s = "abcdefghijklmnopqrs" + "tu"')
  1           0 LOAD_CONST               0 ('abcdefghijklmnopqrs')
              3 LOAD_CONST               1 ('tu')
              6 BINARY_ADD
              7 STORE_NAME               0 (s)
             10 LOAD_CONST               2 (None)
             13 RETURN_VALUE

And older versions of CPython didn't optimize this at all, and some day 
there could be a command-line switch or environment variable to turn 
these optimizations off.

With string concatentation having potential O(N**2) performance, if 
you're writing explicit string concatenations with +, they could 
potentially be *very* expensive at runtime.



-- 
Steve


More information about the Python-ideas mailing list