[Tutor] a little loop

boB Stepp robertvstepp at gmail.com
Wed May 29 18:51:12 CEST 2013


On Wed, May 29, 2013 at 11:07 AM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> On 29 May 2013 16:38, boB Stepp <robertvstepp at gmail.com> wrote:
>> On Tue, May 28, 2013 at 9:34 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>>>
>>> However, a word of warning: although you *can* assemble a new string character by character like that, you should not, because it risks being very slow. *Painfully* slow. If you want to hear the details, please ask, but it risks being slow for much the same reason as the infamous Shlemiel the Painter Algorithm:
>>>
>>> http://www.joelonsoftware.com/articles/fog0000000319.html
>>
>> Okay, I will come out of lurking and ask: What are the details?
>
> Here is the code that Steven was referring to:
>
> ham = ''
> for char in 'spam':
>     ham = ham + char
>     print(ham)
>
> What this code does is to create an empty string ham. Then for each
> character in 'spam' is creates a new string by appending the character
> to ham. However Python cannot append strings it can only create new
> strings since strings are immutable*.
>
> On the first iteration of the loop the zero length string is combined
> with the character and a string of length 1 is created.
> On the second a new string of length 2 is created.
> On the third a new string of length 3 is created.
> On the fourth a new string of length 4 is created.
>
> So four strings are create with lengths 0, 1, 2, 3, 4.
>
> If we did this with N characters then N strings would be created with
> lengths 0, 1, 2, 3, ... , N. If cost of creating each string is
> proportional to its length then the cost of the operation as a whole
> is proportional to 0 + 1 + 2 + 3 + ... + N. The general formula to
> compute this sum is (N**2 + N)/2. In other words the whole operation
> is quadratic O(N**2) in the number of characters that are combined.
>
> The Shlemiel the painter story describes a similar operation in which
> the time taken to paint each section increases linearly as more
> sections get painted. The result is that the time taken for Shlemiel
> to paint a stretch of road is proportional to the square of its length
> when it should just be proportional to its length.
>

After reading the link that Steven gave, I understood this, but you
have filled in some details. I was wondering how...

> I don't know exactly how str.join is implemented but it does not use
> this quadratic algorithm. For example if str.join would first compute
> the length of the resulting string first then it can allocate memory
> for exactly one string of that length and copy each substring to the
> appropriate place (actually I imagine it uses an exponentially
> resizing buffer but this isn't important).
>

...str.join gets around these issues? In the linked article it was
discussing increasing memory allocation by powers of two instead of
trying to determine the exact length of the strings involved,
mentioning that the maximum wasted memory would be 50% of what was
actually needed. Is Python more clever in its implementation?

> * CPython actually has an optimisation that can append strings in
> precisely this situation. However it is an implementation detail of
> CPython that may change and it does not work in other interpreters
> e.g. Jython. Using this kind of code can damage portability since your
> program may run fine in CPython but fail in other interpreters.
>
You are speaking of "appending" and not "concatenation" here?

I had not even considered other Python interpreters than CPython. More
complexity to consider for the future...

boB


More information about the Tutor mailing list