Using + with strings considered bad

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Apr 29 15:15:07 CEST 2015


On Wed, 29 Apr 2015 06:29 pm, Cecil Westerhof wrote:

> Because I try to keep my lines (well) below 80 characters, I use the
> following:
>     print('Calculating fibonacci and fibonacci_memoize once for ' +
>           str(large_fibonacci) + ' to determine speed increase')

That's perfectly fine, but these two alternatives may be better:

    print('Calculating fibonacci and fibonacci_memoize once for'
           ' %s to determine speed increase' % large_fibonacci)

    print('Calculating fibonacci and fibonacci_memoize once for'
           ' {} to determine speed increase'.format(large_fibonacci))



> But I was told that using + with strings was bad practice. Is this
> true? If so, what is the better way to do this?

*Repeated* string concatenation is bad practice. Concatenating one or two
strings is fine. Doing it in a loop to build up a big string is bad mojo.


# Perfectly fine:
message = prefix + "something or other" + suffix


# Okay, but there are better alternatives:
for item in things:
    message = "something " + str(item)
    print(message)


# This is asking for trouble.
# Use ''.join(substrings) instead.
text = ''
for s in substrings:
    text = text + s


The problem with the third one is that it has to make temporary strings
which get thrown away, and that gets very expensive if there are many
substrings. Suppose our substrings are "a", "bb", "ccc", "dddd", "eeeee",
then the temporary strings that are made end up being:

text = "a"  # copies one character (maybe?)
text = "abb"  # copies three characters
text = "abbccc"  # copies six characters
text = "abbcccdddd"  # copies ten characters
text = "abbcccddddeeeee"  # copies fifteen characters

So to build a string of length 15, Python ends up copying 34 or 35
characters. As the number of substrings increases, the amount of wasted
copying blows out: repeated string concatenation behaves quadratically,
which is very slow.

The tricky part is that Python starting from version 2.3 introduced an
optimization that *may* avoid all those extra copying under *some*
circumstances. So with casual testing, you might not notice the quadratic
behaviour, and see linear behaviour.

Until you rely on it being fast, and it isn't.



-- 
Steven




More information about the Python-list mailing list