[Tutor] concat vs join - followup
Kent Johnson
kent_johnson at skillsoft.com
Sat Aug 28 06:02:06 CEST 2004
A couple of people have made good points about my post comparing string
concatenation and join.
Marilyn Davis pointed out that in my data, the crossover point where join
beats concatenation is always around 500 total characters in the final
string. Hans Nowak pointed out that for much longer strings such the lines
of a file or parts of a web page, the crossover point comes very quickly.
So here is ConcatTimer version 2 :-) This version dispenses with the fancy
graphics and just looks for the crossover point. (It's not too smart about
it, either.) It also looks at much larger text chunks - up to 80
characters. Here is the program:
import timeit
reps = 100 # How many reps to try?
unit = ' ' # Concat this string
# Naive concatenation using string +
def concatPlus(count):
s=''
for i in range(count):
s += unit
return s
# Concatention with string.join
def concatJoin(count):
s=[]
for i in range(count):
s.append(unit)
return ''.join(s)
# Time one test case
def timeOne(fn, count):
setup = "from __main__ import " + fn.__name__
stmt = '%s(%d)' % (fn.__name__, count)
t = timeit.Timer(stmt, setup)
secs = min(t.repeat(3, reps))
return secs
# For strings of length unitLen, find the crossover point where appending
# takes the same amount of time as joining
def findOne(unitLen):
global unit
unit = ' ' * unitLen
t = 2
while 1:
tPlus = timeOne(concatPlus, t)
tJoin = timeOne(concatJoin, t)
if tPlus > tJoin:
break
t += 1
return t, tPlus, tJoin
for unitLen in range(1,80):
t, tPlus, tJoin = findOne(unitLen)
print '%2d %3d %3d %1.5f %1.5f' % (unitLen, t, t*unitLen, tPlus, tJoin)
And here is an elided list of results. The columns are the length of the
pieces, the number of pieces where concat becomes more expensive than join,
the total number of characters in the string at the crossover point, and
the actual times. (I cut the number of reps down to keep this from taking
too long to run.)
1 475 475 0.02733 0.02732
2 263 526 0.01581 0.01581
3 169 507 0.01024 0.01022
4 129 516 0.00782 0.00778
5 100 500 0.00622 0.00604
6 85 510 0.00517 0.00515
7 73 511 0.00447 0.00446
8 63 504 0.00386 0.00385
9 57 513 0.00354 0.00353
10 53 530 0.00333 0.00333
11 47 517 0.00294 0.00292
12 45 540 0.00287 0.00285
13 41 533 0.00262 0.00260
14 38 532 0.00246 0.00244
15 36 540 0.00232 0.00230
16 34 544 0.00222 0.00222
17 31 527 0.00200 0.00199
18 29 522 0.00189 0.00188
19 30 570 0.00199 0.00194
20 28 560 0.00188 0.00186
21 28 588 0.00190 0.00185
22 26 572 0.00177 0.00174
23 25 575 0.00170 0.00168
24 24 576 0.00165 0.00163
25 23 575 0.00158 0.00156
26 22 572 0.00153 0.00151
27 21 567 0.00146 0.00144
28 21 588 0.00146 0.00146
29 21 609 0.00147 0.00144
30 20 600 0.00142 0.00139
31 19 589 0.00134 0.00134
32 20 640 0.00143 0.00139
33 19 627 0.00137 0.00136
34 18 612 0.00130 0.00129
35 18 630 0.00131 0.00130
36 18 648 0.00133 0.00130
37 17 629 0.00126 0.00126
38 17 646 0.00126 0.00124
39 15 585 0.00112 0.00111
43 15 645 0.00113 0.00110
44 14 616 0.00106 0.00105
45 15 675 0.00114 0.00110
46 14 644 0.00106 0.00105
48 14 672 0.00109 0.00105
49 13 637 0.00100 0.00099
58 13 754 0.00104 0.00100
59 12 708 0.00098 0.00096
69 12 828 0.00102 0.00098
70 11 770 0.00093 0.00092
77 11 847 0.00094 0.00091
78 10 780 0.00086 0.00086
79 10 790 0.00087 0.00085
So, for anyone still reading, you can see that Hans is right and Marilyn is
close:
- For longer strings and more than a few appends, join is clearly a win
- The total number of characters at the crossover isn't quite constant, but
it grows slowly.
Based on this experiment I would say that if the total number of characters
is less than 500-1000, concatenation is fine. For anything bigger, use join.
Of course the total amount of time involved in any case is pretty small.
Unless you have a lot of characters or you are building a lot of strings, I
don't think it really matters too much.
----------
I started this experiment because I have been telling people on the Tutor
mailing list to use join, and I wondered how much it really mattered. Does
it make enough of a difference to bring it up to beginners? I'm not sure.
It's good to teach best practices, but maybe it's a poor use of time to
teach this to beginners. I won't be so quick to bring it up next time.
Kent
More information about the Tutor
mailing list