[Python-ideas] Descouraging the implicit string concatenation
Steven D'Aprano
steve at pearwood.info
Wed Mar 14 09:15:52 EDT 2018
On Wed, Mar 14, 2018 at 09:18:30AM -0300, Facundo Batista wrote:
> Hello!
>
> What would you think about formally descouraging the following idiom?
>
> long_string = (
> "some part of the string "
> "with more words, actually is the same "
> "string that the compiler puts together")
I would hate that.
> We should write the following, instead:
>
> long_string = (
> "some part of the string " +
> "with more words, actually is the same " +
> "string that the compiler puts together")
Should we? I disagree.
Of course you're welcome to specify that in your own style-guide for
your own code, but I won't be following that recommendation.
> I know that "no change to Python itself" is needed, but having a
> formal discouragement of the idiom will help in avoiding people to
> fall in mistakes like:
>
> fruits = {
> "apple",
> "orange"
> "banana",
> "melon",
> }
People can make all sorts of mistakes through carlessness. I wrote
{y, x*3}
the other day instead of {y: x**3}. (That's *two* errors in one simple
expression. I wish I could say it was a record for me.) Should we
"discourage" exponentiation and dict displays and insist on writing
dict((y, x*x*x)) to avoid the risk of errors? I don't think so.
I think string concatenation falls into the same category. Sometimes
even the most careful writer makes a mistake (let alone careless
writers). That's life. Not every problem needs a technical "solution".
Sometimes the right solution is to proof-read your code, or have code
review by a fresh pair of eyes.
And tests, of course.
[...]
> Note that there's no penalty in adding the '+' between the strings,
> those are resolved at compilation time.
That is an implementation feature, not a language requirement. Not all
Python interpreters will do that, and they are free to put limits on
how much they optimize. Here's Python 3.5:
py> import dis
py> dis.dis('s = "abcdefghijklmnopqrs" + "t"')
1 0 LOAD_CONST 3 ('abcdefghijklmnopqrst')
3 STORE_NAME 0 (s)
6 LOAD_CONST 2 (None)
9 RETURN_VALUE
But now see this:
py> dis.dis('s = "abcdefghijklmnopqrs" + "tu"')
1 0 LOAD_CONST 0 ('abcdefghijklmnopqrs')
3 LOAD_CONST 1 ('tu')
6 BINARY_ADD
7 STORE_NAME 0 (s)
10 LOAD_CONST 2 (None)
13 RETURN_VALUE
And older versions of CPython didn't optimize this at all, and some day
there could be a command-line switch or environment variable to turn
these optimizations off.
With string concatentation having potential O(N**2) performance, if
you're writing explicit string concatenations with +, they could
potentially be *very* expensive at runtime.
--
Steve
More information about the Python-ideas
mailing list