Speed of string += string
Carl Banks
imbosol-1050185957 at aerojockey.com
Sat Apr 12 18:35:37 EDT 2003
Paul Moore wrote:
> Mads Orbesen Troest <mads at troest.NEVERMORE.dk> writes:
>
>> Given that strings are immutable, is it exceedingly slow (and memory
>> spiking) to do a lot of "string += string" operations in one's code? IOW,
>> does each and every += evaluation lead to a new concatenated copy of the
>> previous string (now freed for garbage collection) and the new string, so
>> that the longer the string to which stuff is appended is, the longer times
>> each += operation takes?
>
> I don't know the underlying details too well (and I'm not sure they
> are all that important) but yes, string += string does construct a new
> string each time, and hence is relatively expensive.
>
> The normal idiom for building a string in chunks is to create a list
> of the chunks, and then join them once at the end. So you do:
>
> chunks = []
> while whatever():
> chunks.append(next_bit_of_string())
> result = ''.join(chunks)
>
> You could also use StringIO. However, all this does is basically what
> I describe above, plus extra complexity to handle all the methods you
> don't need. So you're better sticking with the above idiom.
According to my benchmark, cStringIO is a bit faster than joining a
list of strings when appending one character at a time (as opposed to,
say, one line at a time, where it appears to be slightly slower).
Also, it certainly uses less memory than a list of strings, so I'd say
sometimes it is better to use cStringIO.
Surprisingly, += doesn't appear to be a major time waster for small
(meaning about 10000 bytes) strings. I'm guessing it's because the
malloc my system (Debian Linux) is optimized for small memory
allocations.
from cStringIO import StringIO
from time import time
import sys
def benchmark(n,m):
c = 'c'*m
start = time()
s = ''
for i in xrange(n):
s += c
end = time()
print "+=: %.3f seconds" % (end-start)
start = time()
l = []
for i in xrange(n):
l.append(c)
end = time()
s = ''.join(l)
print "list: %.3f seconds" % (end-start)
start = time()
f = StringIO()
for i in xrange(n):
f.write(c)
s = f.getvalue()
end = time()
print "cStringIO: %.3f seconds" % (end-start)
benchmark(int(sys.argv[1]),int(sys.argv[2]))
On my system, with n=10000, m=1:
+=: 0.041 seconds
list: 0.023 seconds
cStringIO: 0.019 seconds
With n=100000, m=1:
+=: 8.002 seconds
list: 0.239 seconds
cStringIO: 0.187 seconds
With n=10000, m=40:
+=: 19.681 seconds
list: 0.022 seconds
cStringIO: 0.026 seconds
I'm running Python 2.1.3 on Linux.
--
CARL BANKS
More information about the Python-list
mailing list