StringType add operation seems every inefficient

Jeremy Hylton jeremy at alum.mit.edu
Tue Feb 13 10:00:14 EST 2001


>>>>> "RL" == Lee, Rick <rickylee at americasm01.nt.com> writes:

  RL> If I do something like this: 
  RL> s = '' 
  RL> for i in listOfStrings:
  RL>     s = s + i

  RL> Once listOfStrings is say 10000 members of 100 bytes, it takes
  RL> forever (ie. minutes) for the "for" loop to finish.  This is
  RL> running on a 700MHz WinNT machine, Python 2.0

  RL> Contrast with:

  RL> s = '' 
  RL> blocks = [] 
  RL> for i in listOfStrings:
  RL>     s = s + i if len(s) > 4000
  RL>         blocks.append(s); s = ''

  RL> This takes a blink of the eye to finish, so it is faster than
  RL> the first piece of code by between 100 and 1000 times.

  RL> Is this relative inefficiency in the string add operation
  RL> expected?

It is expected, because the string add version is doing something
quite different than accumulator version.  The 's = s + i' line is
making a new string for each iteration of the loop.  To make the new
string, the interpreter allocates a new string object and copies the
contents of s and i into it.

It may help you to see what is happening if you unroll the loop by
hand: 

    s0 = ''
    s1 = s0 + listOfStrings[0]
    s2 = s1 + listOfStrings[1]
#     ~= s0 + listOfStrings[0] + listOfStrings[1]
    s3 = s2 + listOfStrings[2]
#     ~= s0 + listOfStrings[0] + listOfStrings[1] + listOfStrings[2]

In the end, that's O(N) string allocations and O(N**2) string copies,
where N is the length of listOfStrings.

  RL> I eventually have to produce a big long string of approx. 10MB.
  RL> What is the fastest way to do it without having to write
  RL> C-extensions?

The second technique you show -- using a list to accumulate the
individual strings -- is a standard Python idiom.  At the end, you do
"".join(blocks) or somesuch to produce a single string.  This approach
does exactly one copy for each constituent string.

I believe someone also suggested using cStringIO, which is probably
just as fast.  

Jeremy





More information about the Python-list mailing list