[Python-ideas] Descouraging the implicit string concatenation

Søren Pilgård fiskomaten at gmail.com
Wed Mar 14 09:40:47 EDT 2018


On Wed, Mar 14, 2018 at 2:15 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Wed, Mar 14, 2018 at 09:18:30AM -0300, Facundo Batista wrote:
>> Hello!
>>
>> What would you think about formally descouraging the following idiom?
>>
>>     long_string = (
>>         "some part of the string "
>>         "with more words, actually is the same "
>>         "string that the compiler puts together")
>
> I would hate that.
>
>
>> We should write the following, instead:
>>
>>     long_string = (
>>         "some part of the string " +
>>         "with more words, actually is the same " +
>>         "string that the compiler puts together")
>
> Should we? I disagree.
>
> Of course you're welcome to specify that in your own style-guide for
> your own code, but I won't be following that recommendation.
>
>
>> I know that "no change to Python itself" is needed, but having a
>> formal discouragement of the idiom will help in avoiding people to
>> fall in mistakes like:
>>
>> fruits = {
>>     "apple",
>>     "orange"
>>     "banana",
>>     "melon",
>> }
>
> People can make all sorts of mistakes through carlessness. I wrote
>
>     {y, x*3}
>
> the other day instead of {y: x**3}. (That's *two* errors in one simple
> expression. I wish I could say it was a record for me.) Should we
> "discourage" exponentiation and dict displays and insist on writing
> dict((y, x*x*x)) to avoid the risk of errors? I don't think so.
>
> I think string concatenation falls into the same category. Sometimes
> even the most careful writer makes a mistake (let alone careless
> writers). That's life. Not every problem needs a technical "solution".
> Sometimes the right solution is to proof-read your code, or have code
> review by a fresh pair of eyes.
>
> And tests, of course.
>
>
> [...]
>> Note that there's no penalty in adding the '+' between the strings,
>> those are resolved at compilation time.
>
> That is an implementation feature, not a language requirement. Not all
> Python interpreters will do that, and they are free to put limits on
> how much they optimize. Here's Python 3.5:
>
> py> import dis
> py> dis.dis('s = "abcdefghijklmnopqrs" + "t"')
>   1           0 LOAD_CONST               3 ('abcdefghijklmnopqrst')
>               3 STORE_NAME               0 (s)
>               6 LOAD_CONST               2 (None)
>               9 RETURN_VALUE
>
> But now see this:
>
> py> dis.dis('s = "abcdefghijklmnopqrs" + "tu"')
>   1           0 LOAD_CONST               0 ('abcdefghijklmnopqrs')
>               3 LOAD_CONST               1 ('tu')
>               6 BINARY_ADD
>               7 STORE_NAME               0 (s)
>              10 LOAD_CONST               2 (None)
>              13 RETURN_VALUE
>
> And older versions of CPython didn't optimize this at all, and some day
> there could be a command-line switch or environment variable to turn
> these optimizations off.
>
> With string concatentation having potential O(N**2) performance, if
> you're writing explicit string concatenations with +, they could
> potentially be *very* expensive at runtime.
>
>
>
Of course you can always make error, even in a single letter.
But I think there is a big difference between mixing up +-/* and **
where the operator is in "focus" and the implicit concatenation where
there is no operator.
A common problem is that you have something like
foo(["a",
     "b",
     "c"
]).bar()
but then you remember that there also needs to be a "d" so you just add
foo(["a",
     "b",
     "c"
     "d"
]).bar()
Causing an error, not with the "d" expression you are working on but
due to what you thought was the previous expression but python turns
it into one.
The , is seen as a delimiter by the programmer not as part of the
operation (or the lack of the ,).

We can't remove all potential pitfalls, but I do think there is value
in evaluating whether something has bigger potential to cause harm
than the benefits it brings, especially if there are other ways to do
the same.


More information about the Python-ideas mailing list