[Python-Dev] Usage of += on strings in loops in stdlib
Lennart Regebro
regebro at gmail.com
Wed Feb 13 09:15:35 CET 2013
On Tue, Feb 12, 2013 at 10:03 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
> Hi
>
> We recently encountered a performance issue in stdlib for pypy. It
> turned out that someone commited a performance "fix" that uses += for
> strings instead of "".join() that was there before.
Can someone show the actual diff? Of this?
I'm making a talk about outdated patterns in Python at DjangoCon EU,
prompted by this question, and obsessive avoidance of string
concatenation. But all the tests I've done show that ''.join() still
is faster or as fast, except when you are joining very few strings,
like for example two strings, in which case concatenation is faster or
as fast. Both under PyPy and CPython. So I'd like to know in which
case ''.hoin() is faster on PyPy and += faster on CPython.
Code with times
x = 100000
s1 = 'X'* x
s2 = 'X'* x
for i in xrange(500):
s1 += s2
Python 3.3: 0.049 seconds
PyPy 1.9: 24.217 seconds
PyPy indeed is much much slower than CPython here.
But let's look at the join case:
x = 100000
s1 = 'X'* x
s2 = 'X'* x
for i in xrange(500):
s1 = ''.join((s1, s2))
Python 3.3: 18.969 seconds
PyPy 1.9: 62.539 seconds
Here PyPy needs twice the time, and CPython needs 387 times as long
time. Both are slower.
The best case is of course to make a long list of strings and join them:
x = 100000
s1 = 'X'* x
s2 = 'X'* x
l = [s1]
for i in xrange(500):
l.append(s2)
s1 = ''.join(l)
Python 3.3: 0.052 seconds
PyPy 1.9: 0.117 seconds
That's not always feasible though.
//Lennart
More information about the Python-Dev
mailing list