[Python-Dev] Usage of += on strings in loops in stdlib

Lennart Regebro regebro at gmail.com
Wed Feb 13 09:15:35 CET 2013


On Tue, Feb 12, 2013 at 10:03 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
> Hi
>
> We recently encountered a performance issue in stdlib for pypy. It
> turned out that someone commited a performance "fix" that uses += for
> strings instead of "".join() that was there before.

Can someone show the actual diff? Of this?

I'm making a talk about outdated patterns in Python at DjangoCon EU,
prompted by this question, and obsessive avoidance of string
concatenation. But all the tests I've done show that ''.join() still
is faster or as fast, except when you are joining very few strings,
like for example two strings, in which case concatenation is faster or
as fast. Both under PyPy and CPython. So I'd like to know in which
case ''.hoin() is faster on PyPy and += faster on CPython.

Code with times

    x = 100000
    s1 = 'X'* x
    s2 = 'X'* x

    for i in xrange(500):
         s1 += s2

Python 3.3: 0.049 seconds
PyPy 1.9: 24.217 seconds

PyPy indeed is much much slower than CPython here.
But let's look at the join case:

    x = 100000
    s1 = 'X'* x
    s2 = 'X'* x

    for i in xrange(500):
         s1 = ''.join((s1, s2))

Python 3.3: 18.969 seconds
PyPy 1.9: 62.539 seconds

Here PyPy needs twice the time, and CPython needs 387 times as long
time. Both are slower.

The best case is of course to make a long list of strings and join them:

    x = 100000
    s1 = 'X'* x
    s2 = 'X'* x

    l = [s1]
    for i in xrange(500):
         l.append(s2)

    s1 = ''.join(l)

Python 3.3: 0.052 seconds
PyPy 1.9: 0.117 seconds

That's not always feasible though.


//Lennart


More information about the Python-Dev mailing list