[Tutor] Which is really faster - concatenation or join?

Kent Johnson kent_johnson at skillsoft.com
Sat Aug 28 01:45:19 CEST 2004


A couple of times recently I have given out the conventional advice that 
for concatenating strings, it is better to build a list of the pieces and 
join it together than to use string concatenation to build the list. The 
reasoning is that string concatenation requires copying the entire string 
for each addition, while the list is designed to make concatenation efficient.

Because of all the copying, the time for string concatenation is 
proportional to the square of the number of additions - it is O(n^2). List 
append, however, happens in constant time, so building the list takes time 
proportional to the number of appends - it is O(n).

The trick with this, though, is that there is a proportionality constant 
here, and for small n, string concatenation may be faster. I decided to 
find out.

Here is a program that compares the time for string concatenation using the 
two methods. It varies both the number of append operations and the length 
of the appended strings. It prints the results in a series of tables, and 
it uses VPython (http://www.vpython.org/) to graph the results:

import timeit
from visual.graph import *

reps = 1000 # How many reps to try?
unit = '    ' # Concat this string

# Naive concatenation using string +
def concatPlus(count):
     s=''
     for i in range(count):
         s += unit
     return s

# Concatention the way the big boys do it, with string.join
def concatJoin(count):
     s=[]
     for i in range(count):
         s.append(unit)
     return ''.join(s)


# Time one test case
def timeOne(fn, count):
     setup = "from __main__ import " + fn.__name__
     stmt = '%s(%d)' % (fn.__name__, count)

     t = timeit.Timer(stmt, setup)
     secs = min(t.repeat(3, reps))
     return secs


# Draw the curves for a single length of the appended string
def graphOne(unitLen):
     global unit
     unit = ' ' * unitLen

     title = 'Unit length is %d' % len(unit)
#    graph = gdisplay(title=title, xtitle='count', ytitle='time')
     funct1 = gcurve(color=color.cyan)
     funct2 = gdots(color=color.yellow)

     print
     print title
     print '       tPlus  tJoin'

     for t in range(10, 100, 10) + range(100, 600, 50):
         tPlus = timeOne(concatPlus, t)
         tJoin = timeOne(concatJoin, t)
         print '%5d  %2.3f  %2.3f' % (t, tPlus, tJoin)

         funct1.plot( pos=(t, tPlus) )
         funct2.plot( pos=(t, tJoin) )


graph = gdisplay(title='Append speed', xtitle='count', ytitle='time')
for unitLen in [1,2,3,4,5]:
     graphOne(unitLen)


Here is the graph - the yellow dots are for concatJoin, the blue curves are 
concatPlus:

http://personalpages.tds.net/~kent37/Python/AppendTimes.png

A couple of things stand out from this:
- For every string length, concatPlus is faster when the number of appends 
is relatively small - up to 80 appends in my tests
- For larger numbers of appends, not only does concatPlus show O(n^2) 
behavior, it gets worse as the size of the appended strings grows. 
concatJoin is O(n) and it doesn't really matter how long the appended 
string is. In fact, I think concatPlus is O(m*n) where m is the total 
length of the final string and n is the number of appends.

Based on these results, I think I will stop spreading the conventional 
wisdom. I think most uses of string concatenation are for small strings 
with a small number of concatenations. String join only pays off when there 
are a lot of appends.

Here is the raw data output from the program:

D:\Personal\Tutor>python concattimer.py
Visual-2003-10-05

Unit length is 1
        tPlus  tJoin
    10  0.005  0.008
    20  0.008  0.014
    30  0.012  0.020
    40  0.015  0.025
    50  0.018  0.032
    60  0.022  0.037
    70  0.026  0.044
    80  0.029  0.049
    90  0.033  0.057
   100  0.038  0.062
   150  0.059  0.092
   200  0.082  0.122
   250  0.109  0.149
   300  0.145  0.178
   350  0.184  0.208
   400  0.222  0.237
   450  0.262  0.264
   500  0.307  0.292
   550  0.349  0.325

Unit length is 2
        tPlus  tJoin
    10  0.005  0.008
    20  0.008  0.014
    30  0.012  0.019
    40  0.015  0.026
    50  0.019  0.033
    60  0.023  0.039
    70  0.027  0.044
    80  0.033  0.051
    90  0.038  0.057
   100  0.042  0.062
   150  0.075  0.095
   200  0.114  0.125
   250  0.155  0.154
   300  0.200  0.185
   350  0.247  0.215
   400  0.295  0.243
   450  0.349  0.272
   500  0.404  0.305
   550  0.463  0.332

Unit length is 3
        tPlus  tJoin
    10  0.005  0.008
    20  0.008  0.014
    30  0.012  0.019
    40  0.016  0.026
    50  0.020  0.033
    60  0.025  0.039
    70  0.031  0.045
    80  0.036  0.051
    90  0.043  0.057
   100  0.050  0.064
   150  0.090  0.095
   200  0.134  0.127
   250  0.184  0.159
   300  0.236  0.187
   350  0.293  0.217
   400  0.360  0.247
   450  0.427  0.278
   500  0.499  0.306
   550  0.576  0.335

Unit length is 4
        tPlus  tJoin
    10  0.005  0.008
    20  0.008  0.014
    30  0.012  0.019
    40  0.017  0.025
    50  0.021  0.033
    60  0.026  0.039
    70  0.035  0.045
    80  0.042  0.051
    90  0.050  0.058
   100  0.057  0.063
   150  0.101  0.096
   200  0.153  0.129
   250  0.207  0.161
   300  0.271  0.192
   350  0.341  0.222
   400  0.416  0.249
   450  0.501  0.280
   500  0.580  0.310
   550  0.682  0.338

Unit length is 5
        tPlus  tJoin
    10  0.005  0.008
    20  0.008  0.014
    30  0.013  0.019
    40  0.017  0.025
    50  0.022  0.034
    60  0.029  0.040
    70  0.040  0.046
    80  0.047  0.052
    90  0.055  0.058
   100  0.063  0.064
   150  0.114  0.097
   200  0.169  0.130
   250  0.232  0.163
   300  0.306  0.195
   350  0.385  0.223
   400  0.470  0.252
   450  0.567  0.283
   500  0.668  0.313
   550  0.779  0.344



More information about the Tutor mailing list