O(n^2) is bad - can it be fixed?

Helen Dawson helend at accessone.com
Thu May 24 02:47:58 EDT 2001


Thanks for all the replies. I learned several tricks.

However I find myself more worried than ever. I started out
thinking that list was guaranteed to be O(n) for my problem,
but now it appears that it isn't - it's dependent on good realloc
behaviour. I think that's one thing the STL guys recognized
well - incrementing by any (more or less) constant size
can produce arbitrarily bad performance. A worst case
memory penalty of 2x for doubling the size of list each time
seems a fair price to pay for avoiding that
(especially since you can make the memory penalty
arbitrarily small - any ratio above one is theoretically
equivalent). Oh well.

And, to the person who pointed out that my test script
is utterly meaningless for strings - yes it is, but that is what
happens when you simplify your code to its essence. It's
certainly better than posting the original 50 line script
and expecting random readers to sort through the chaff.

Thanks again all.

Bruce/Helen Dawson

Helen Dawson wrote:

> The following script tests appending to a list and to a string.
>
> def AppendTest(listorstring):
>     for i in range(0, 100):
>         for x in range(0, 2000):
>             listorstring += " "
>         print i
>
> AppendTest([])
> AppendTest("")
>
> The test with the list runs very swiftly and smoothly - the numbers
> are printed out at a steady rate. The test with the string runs
> slowly, and the speed that the numbers appear slows down - a lot. In
> fact, I have never seen the string test end - I always abort it out
> of boredom before it gets to twenty five.
>
> Without even looking in the Python internals it is obvious what is
> happening. In the string case, each time a character is appended the
> entire string is copied. Therefore adding n characters requires
> n^2/2 bytes to be copied, making processing large files (where large
> is > about 10K) very inefficient - and processing files larger
> than about 100K is virtually impossible.
>
> List does much better. Why? It probably does the same thing that the
> STL vector typically does - the allocabe space doubles whenever it
> runs out of room, and then the newly allocated space is filled in
> as needed.
>
> So, why can't string do that?
>
> When we could only go string = string + " " then fixing this problem
> was probably impossible, but with += it seems like it should be quite
> possible - maybe even easy!
>
> Right now if I am producing a text file a character at a time I have
> to do it to a list and then use string.join() to convert at the end.
> That works, but seems clunky.
>
> It's also a landmine for the naive. I recognized the problem
> immediately, but many people would simply assume that Python was slow,
> when it took an hour to process a one Mbyte file.
>
> I'm using Python 2.1, on Windows.
>
> Bruce Dawson.
>
> P.S. Is this a known issue?




More information about the Python-list mailing list