[Tutor] Which is really faster - concatenation or join?
Kent Johnson
kent_johnson at skillsoft.com
Sat Aug 28 01:45:19 CEST 2004
A couple of times recently I have given out the conventional advice that
for concatenating strings, it is better to build a list of the pieces and
join it together than to use string concatenation to build the list. The
reasoning is that string concatenation requires copying the entire string
for each addition, while the list is designed to make concatenation efficient.
Because of all the copying, the time for string concatenation is
proportional to the square of the number of additions - it is O(n^2). List
append, however, happens in constant time, so building the list takes time
proportional to the number of appends - it is O(n).
The trick with this, though, is that there is a proportionality constant
here, and for small n, string concatenation may be faster. I decided to
find out.
Here is a program that compares the time for string concatenation using the
two methods. It varies both the number of append operations and the length
of the appended strings. It prints the results in a series of tables, and
it uses VPython (http://www.vpython.org/) to graph the results:
import timeit
from visual.graph import *
reps = 1000 # How many reps to try?
unit = ' ' # Concat this string
# Naive concatenation using string +
def concatPlus(count):
s=''
for i in range(count):
s += unit
return s
# Concatention the way the big boys do it, with string.join
def concatJoin(count):
s=[]
for i in range(count):
s.append(unit)
return ''.join(s)
# Time one test case
def timeOne(fn, count):
setup = "from __main__ import " + fn.__name__
stmt = '%s(%d)' % (fn.__name__, count)
t = timeit.Timer(stmt, setup)
secs = min(t.repeat(3, reps))
return secs
# Draw the curves for a single length of the appended string
def graphOne(unitLen):
global unit
unit = ' ' * unitLen
title = 'Unit length is %d' % len(unit)
# graph = gdisplay(title=title, xtitle='count', ytitle='time')
funct1 = gcurve(color=color.cyan)
funct2 = gdots(color=color.yellow)
print
print title
print ' tPlus tJoin'
for t in range(10, 100, 10) + range(100, 600, 50):
tPlus = timeOne(concatPlus, t)
tJoin = timeOne(concatJoin, t)
print '%5d %2.3f %2.3f' % (t, tPlus, tJoin)
funct1.plot( pos=(t, tPlus) )
funct2.plot( pos=(t, tJoin) )
graph = gdisplay(title='Append speed', xtitle='count', ytitle='time')
for unitLen in [1,2,3,4,5]:
graphOne(unitLen)
Here is the graph - the yellow dots are for concatJoin, the blue curves are
concatPlus:
http://personalpages.tds.net/~kent37/Python/AppendTimes.png
A couple of things stand out from this:
- For every string length, concatPlus is faster when the number of appends
is relatively small - up to 80 appends in my tests
- For larger numbers of appends, not only does concatPlus show O(n^2)
behavior, it gets worse as the size of the appended strings grows.
concatJoin is O(n) and it doesn't really matter how long the appended
string is. In fact, I think concatPlus is O(m*n) where m is the total
length of the final string and n is the number of appends.
Based on these results, I think I will stop spreading the conventional
wisdom. I think most uses of string concatenation are for small strings
with a small number of concatenations. String join only pays off when there
are a lot of appends.
Here is the raw data output from the program:
D:\Personal\Tutor>python concattimer.py
Visual-2003-10-05
Unit length is 1
tPlus tJoin
10 0.005 0.008
20 0.008 0.014
30 0.012 0.020
40 0.015 0.025
50 0.018 0.032
60 0.022 0.037
70 0.026 0.044
80 0.029 0.049
90 0.033 0.057
100 0.038 0.062
150 0.059 0.092
200 0.082 0.122
250 0.109 0.149
300 0.145 0.178
350 0.184 0.208
400 0.222 0.237
450 0.262 0.264
500 0.307 0.292
550 0.349 0.325
Unit length is 2
tPlus tJoin
10 0.005 0.008
20 0.008 0.014
30 0.012 0.019
40 0.015 0.026
50 0.019 0.033
60 0.023 0.039
70 0.027 0.044
80 0.033 0.051
90 0.038 0.057
100 0.042 0.062
150 0.075 0.095
200 0.114 0.125
250 0.155 0.154
300 0.200 0.185
350 0.247 0.215
400 0.295 0.243
450 0.349 0.272
500 0.404 0.305
550 0.463 0.332
Unit length is 3
tPlus tJoin
10 0.005 0.008
20 0.008 0.014
30 0.012 0.019
40 0.016 0.026
50 0.020 0.033
60 0.025 0.039
70 0.031 0.045
80 0.036 0.051
90 0.043 0.057
100 0.050 0.064
150 0.090 0.095
200 0.134 0.127
250 0.184 0.159
300 0.236 0.187
350 0.293 0.217
400 0.360 0.247
450 0.427 0.278
500 0.499 0.306
550 0.576 0.335
Unit length is 4
tPlus tJoin
10 0.005 0.008
20 0.008 0.014
30 0.012 0.019
40 0.017 0.025
50 0.021 0.033
60 0.026 0.039
70 0.035 0.045
80 0.042 0.051
90 0.050 0.058
100 0.057 0.063
150 0.101 0.096
200 0.153 0.129
250 0.207 0.161
300 0.271 0.192
350 0.341 0.222
400 0.416 0.249
450 0.501 0.280
500 0.580 0.310
550 0.682 0.338
Unit length is 5
tPlus tJoin
10 0.005 0.008
20 0.008 0.014
30 0.013 0.019
40 0.017 0.025
50 0.022 0.034
60 0.029 0.040
70 0.040 0.046
80 0.047 0.052
90 0.055 0.058
100 0.063 0.064
150 0.114 0.097
200 0.169 0.130
250 0.232 0.163
300 0.306 0.195
350 0.385 0.223
400 0.470 0.252
450 0.567 0.283
500 0.668 0.313
550 0.779 0.344
More information about the Tutor
mailing list