[Tutor] a little loop

Oscar Benjamin oscar.j.benjamin at gmail.com
Wed May 29 18:07:06 CEST 2013


On 29 May 2013 16:38, boB Stepp <robertvstepp at gmail.com> wrote:
> On Tue, May 28, 2013 at 9:34 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>>
>> However, a word of warning: although you *can* assemble a new string character by character like that, you should not, because it risks being very slow. *Painfully* slow. If you want to hear the details, please ask, but it risks being slow for much the same reason as the infamous Shlemiel the Painter Algorithm:
>>
>> http://www.joelonsoftware.com/articles/fog0000000319.html
>
> Okay, I will come out of lurking and ask: What are the details?

Here is the code that Steven was referring to:

ham = ''
for char in 'spam':
    ham = ham + char
    print(ham)

What this code does is to create an empty string ham. Then for each
character in 'spam' is creates a new string by appending the character
to ham. However Python cannot append strings it can only create new
strings since strings are immutable*.

On the first iteration of the loop the zero length string is combined
with the character and a string of length 1 is created.
On the second a new string of length 2 is created.
On the third a new string of length 3 is created.
On the fourth a new string of length 4 is created.

So four strings are create with lengths 0, 1, 2, 3, 4.

If we did this with N characters then N strings would be created with
lengths 0, 1, 2, 3, ... , N. If cost of creating each string is
proportional to its length then the cost of the operation as a whole
is proportional to 0 + 1 + 2 + 3 + ... + N. The general formula to
compute this sum is (N**2 + N)/2. In other words the whole operation
is quadratic O(N**2) in the number of characters that are combined.

The Shlemiel the painter story describes a similar operation in which
the time taken to paint each section increases linearly as more
sections get painted. The result is that the time taken for Shlemiel
to paint a stretch of road is proportional to the square of its length
when it should just be proportional to its length.

I don't know exactly how str.join is implemented but it does not use
this quadratic algorithm. For example if str.join would first compute
the length of the resulting string first then it can allocate memory
for exactly one string of that length and copy each substring to the
appropriate place (actually I imagine it uses an exponentially
resizing buffer but this isn't important).

* CPython actually has an optimisation that can append strings in
precisely this situation. However it is an implementation detail of
CPython that may change and it does not work in other interpreters
e.g. Jython. Using this kind of code can damage portability since your
program may run fine in CPython but fail in other interpreters.


Oscar


More information about the Tutor mailing list