Steve D'Aprano, you're the "master". What's wrong with this concatenation statement?

Steven D'Aprano steve at
Thu May 12 11:06:18 EDT 2016

On Thu, 12 May 2016 07:36 pm, Ned Batchelder wrote:

> The CPython optimization depends on the string having only a single
> reference.  A seemingly unrelated change to the code can change the
> performance significantly:
>     In [1]: %%timeit
>        ...: s = ""
>        ...: for x in xrange(100000):
>        ...:   s = s + str(x)
>        ...:
>     10 loops, best of 3: 33.5 ms per loop
>     In [2]: %%timeit
>        ...: s = t = ""
>        ...: for x in xrange(100000):
>        ...:   s = t = s + str(x)
>        ...:
>     1 loop, best of 3: 1.57 s per loop

Nice demonstration!

But it is actually even worse than that. The optimization depends on memory
allocation details which means that some CPython interpreters cannot use
it, depending on the operating system and version.

Consequently, reliance on it can and has lead to embarrassments like this
performance bug which only affected *some* Windows users. In 2009, Chris
Withers asked for help debugging a problem where Python httplib was
hundreds of times slower than other tools, like wget and Internet Explorer:

A few weeks later, Simon Cross realised the problem was probably the
quadratic behaviour of repeated string addition:

leading to this quote from Antoine Pitrou:

"Given differences between platforms in realloc() performance, it might be
the reason why it goes unnoticed under Linux but degenerates under

and Guido's comment:

"Also agreed that this is an embarrassment."

So beware of relying on the CPython string concatenation optimization in
production code!

Here's the tracker issue that added the optimization in the first place:

The feature was controversial at the time (and remains slightly so):

My opinion is that it is great for interactive use at the Python prompt, but
I would never use it in code I cared about.


More information about the Python-list mailing list