O(n^2) is bad - can it be fixed?

kosh kosh at aesaeion.com
Tue May 22 08:49:04 CEST 2001


Helen Dawson wrote:

> The following script tests appending to a list and to a string.
> 
> 
> def AppendTest(listorstring):
>     for i in range(0, 100):
>         for x in range(0, 2000):
>             listorstring += " "
>         print i
> 
> AppendTest([])
> AppendTest("")
> 
> 
> The test with the list runs very swiftly and smoothly - the numbers
> are printed out at a steady rate. The test with the string runs
> slowly, and the speed that the numbers appear slows down - a lot. In
> fact, I have never seen the string test end - I always abort it out
> of boredom before it gets to twenty five.
> 
> Without even looking in the Python internals it is obvious what is
> happening. In the string case, each time a character is appended the
> entire string is copied. Therefore adding n characters requires
> n^2/2 bytes to be copied, making processing large files (where large
> is > about 10K) very inefficient - and processing files larger
> than about 100K is virtually impossible.
> 
> List does much better. Why? It probably does the same thing that the
> STL vector typically does - the allocabe space doubles whenever it
> runs out of room, and then the newly allocated space is filled in
> as needed.
> 
> So, why can't string do that?
> 
> When we could only go string = string + " " then fixing this problem
> was probably impossible, but with += it seems like it should be quite
> possible - maybe even easy!
> 
> Right now if I am producing a text file a character at a time I have
> to do it to a list and then use string.join() to convert at the end.
> That works, but seems clunky.
> 
> It's also a landmine for the naive. I recognized the problem
> immediately, but many people would simply assume that Python was slow,
> when it took an hour to process a one Mbyte file.
> 
> I'm using Python 2.1, on Windows.
> 
> Bruce Dawson.
> 
> P.S. Is this a known issue?
> 
> 
> 


The reason that is slow is that you modify the lenght of the string many 
times. The list one is far more efficient especially with a few 
modifications. For the list one when you do a .join you only put all of 
them together once insetead of adding together many times.

However also consider this case

list = []
for i in somelist:
        list.append(i)
string.join(list, "")

a faster way then that is
list = []
append = list.append
for i in somelist:
        append(i)
string.join(list, "")

That works faster become it removes a lookup operation from every iteration 
of the loop. I know this doesn't precisely answer your question but it 
should enable you to code a much faster solution.



More information about the Python-list mailing list