Concatenating Strings

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Apr 10 07:07:27 CEST 2015


On Fri, 10 Apr 2015 04:29 am, Travis Griggs wrote:

> I was doing some maintenance now on a script of mine… I noticed that I
> compose strings in this little 54 line file multipole times using the +
> operator. I was prototyping at the time I wrote it and it was quick and
> easy. I don’t really care for the way they read. Here’s 3 examples:
> 
>     if k + ‘_@‘ in documents:
> 
>     timeKey = k + ‘_@‘
> 
>     historyKey = thingID + ‘_’ + k
> 
> I’m curious where others lean stylistically with this kind of thing.


String concatenation requires some judgement, not a mindless mechanical
choice. But not *much* judgement :-)

If you're only concatenating a couple of strings, use the + operator, or the
equivalent augmented assignment:

    if make_plural:
        noun += "s"  # like noun = noun + "s"


But if you're concatenating a lot of strings, use join:


    if condition:
        substrings = [mystring,
                      'fe', 'fi', 'fo', 'fum', 
                      'groucho', 'chico', 'harpo', 'zeppo', 'gummo', 
                      'spam', 'eggs', 'cheese', 'tomato',
                      'do', 're', 'mi', 'fa', 'so', 'la', 'ti']
        s = ''.join(substrings)


This is especially the case when you are concatenating inside a loop:


    for suffix in suffixes:
        word = word + suffix


The problem with concatenation is that strings are immutable, so each string
concatenation has to create a brand new temporary string. Hence:

    'a' + 'b' + 'c' + 'd' + ... + 'z'

creates temporary strings:

    'ab'
    'abc'
    'abcd'
    'abcde'

etc only to throw them away a moment later. This is a lot of wasted work
that can be slow. How slow? In the worst case, *very* slow. Some years ago,
there was a bug reported with Python that had Python downloading data from
a local network *tens of thousands of times slower* than IE or Firefox
could do. Something that the web browser would download in a millisecond,
Python would take multiple minutes to download. It took weeks of effort to
track the problem down to repeated string concatenation inside one of the
Python libraries. Guido himself described it as an embarrassment.

The trap is that *sometimes* the Python interpreter can optimize those
intermediate string concatenations so as to avoid the slow behaviour, but
only sometimes. Whether it works or not is highly dependent on the code,
whether the substrings have been cached, and how the operating system
allocates memory. That means that 99% of the time, you can run some code
which does repeated concatenation, and it will run as fast as join, but one
day a user will run the same code and report that it runs thousands of
times slower than expected. But when you run the code on your own machine,
you won't see the problem. Ouch.

So although the "make string concatenation fast (well, usually fast)"
optimization is a nice thing to have when it works, it is risky to rely on
it. Hence you should always use join in production code any time you are
concatenating more than (say) half a dozen substrings.

But for the common case where you just want to append a suffix, or prepend a
prefix, using + is perfectly fine.




-- 
Steven




More information about the Python-list mailing list