[Tutor] concat vs join - followup

Kent Johnson kent_johnson at skillsoft.com
Sat Aug 28 06:02:06 CEST 2004


A couple of people have made good points about my post comparing string 
concatenation and join.

Marilyn Davis pointed out that in my data, the crossover point where join 
beats concatenation is always around 500 total characters in the final 
string. Hans Nowak pointed out that for much longer strings such the lines 
of a file or parts of a web page, the crossover point comes very quickly.

So here is ConcatTimer version 2 :-) This version dispenses with the fancy 
graphics and just looks for the crossover point. (It's not too smart about 
it, either.) It also looks at much larger text chunks - up to 80 
characters. Here is the program:

import timeit

reps = 100 # How many reps to try?
unit = '    ' # Concat this string

# Naive concatenation using string +
def concatPlus(count):
    s=''
    for i in range(count):
        s += unit
    return s

# Concatention with string.join
def concatJoin(count):
    s=[]
    for i in range(count):
        s.append(unit)
    return ''.join(s)

# Time one test case
def timeOne(fn, count):
    setup = "from __main__ import " + fn.__name__
    stmt = '%s(%d)' % (fn.__name__, count)

    t = timeit.Timer(stmt, setup)
    secs = min(t.repeat(3, reps))
    return secs

# For strings of length unitLen, find the crossover point where appending
# takes the same amount of time as joining
def findOne(unitLen):
    global unit
    unit = ' ' * unitLen
    t = 2

    while 1:
        tPlus = timeOne(concatPlus, t)
        tJoin = timeOne(concatJoin, t)
        if tPlus > tJoin:
            break
        t += 1

    return t, tPlus, tJoin

for unitLen in range(1,80):
    t, tPlus, tJoin = findOne(unitLen)
    print '%2d %3d %3d %1.5f %1.5f' % (unitLen, t, t*unitLen, tPlus, tJoin)

And here is an elided list of results. The columns are the length of the 
pieces, the number of pieces where concat becomes more expensive than join, 
the total number of characters in the string at the crossover point, and 
the actual times. (I cut the number of reps down to keep this from taking 
too long to run.)

  1 475 475 0.02733 0.02732
  2 263 526 0.01581 0.01581
  3 169 507 0.01024 0.01022
  4 129 516 0.00782 0.00778
  5 100 500 0.00622 0.00604
  6  85 510 0.00517 0.00515
  7  73 511 0.00447 0.00446
  8  63 504 0.00386 0.00385
  9  57 513 0.00354 0.00353
10  53 530 0.00333 0.00333
11  47 517 0.00294 0.00292
12  45 540 0.00287 0.00285
13  41 533 0.00262 0.00260
14  38 532 0.00246 0.00244
15  36 540 0.00232 0.00230
16  34 544 0.00222 0.00222
17  31 527 0.00200 0.00199
18  29 522 0.00189 0.00188
19  30 570 0.00199 0.00194
20  28 560 0.00188 0.00186
21  28 588 0.00190 0.00185
22  26 572 0.00177 0.00174
23  25 575 0.00170 0.00168
24  24 576 0.00165 0.00163
25  23 575 0.00158 0.00156
26  22 572 0.00153 0.00151
27  21 567 0.00146 0.00144
28  21 588 0.00146 0.00146
29  21 609 0.00147 0.00144
30  20 600 0.00142 0.00139
31  19 589 0.00134 0.00134
32  20 640 0.00143 0.00139
33  19 627 0.00137 0.00136
34  18 612 0.00130 0.00129
35  18 630 0.00131 0.00130
36  18 648 0.00133 0.00130
37  17 629 0.00126 0.00126
38  17 646 0.00126 0.00124
39  15 585 0.00112 0.00111
43  15 645 0.00113 0.00110
44  14 616 0.00106 0.00105
45  15 675 0.00114 0.00110
46  14 644 0.00106 0.00105
48  14 672 0.00109 0.00105
49  13 637 0.00100 0.00099
58  13 754 0.00104 0.00100
59  12 708 0.00098 0.00096
69  12 828 0.00102 0.00098
70  11 770 0.00093 0.00092
77  11 847 0.00094 0.00091
78  10 780 0.00086 0.00086
79  10 790 0.00087 0.00085

So, for anyone still reading, you can see that Hans is right and Marilyn is 
close:
- For longer strings and more than a few appends, join is clearly a win
- The total number of characters at the crossover isn't quite constant, but 
it grows slowly.

Based on this experiment I would say that if the total number of characters 
is less than 500-1000, concatenation is fine. For anything bigger, use join.

Of course the total amount of time involved in any case is pretty small. 
Unless you have a lot of characters or you are building a lot of strings, I 
don't think it really matters too much.

----------
I started this experiment because I have been telling people on the Tutor 
mailing list to use join, and I wondered how much it really mattered. Does 
it make enough of a difference to bring it up to beginners? I'm not sure. 
It's good to teach best practices, but maybe it's a poor use of time to 
teach this to beginners. I won't be so quick to bring it up next time.

Kent 



More information about the Tutor mailing list