[Tutor] Iterate Suggestion

Steven D'Aprano steve at pearwood.info
Sun Apr 15 04:40:29 CEST 2012


Bod Soutar wrote:

> How about something like this
> 
> mylist =  ['serverA', 'serverB', 'serverC', 'serverD','serverE', 'serverF',
> 'serverG']
> tempstr = ""
> count = 0
> 
> for item in mylist:
>     count += 1
>     if count == 3:
>         tempstr += (i + "\n")
>         count = 0
>     else:
>         tempstr += (i + " ")
> 
> print tempstr


Warning: this is a good way to write HORRIBLY slow code that potentially takes 
many minutes or even hours to generate output. And even worst, it will occur 
inconsistently, making it really hard to debug.

The right way to join many strings into one is with the join method: 
accumulate the substrings into a list, and then join them in one go:

' '.join(list_of_words)

The problem with your code is that you are doing repeated string 
concatenation, which is slow. If you understand Big Oh notation, string 
concatenation is O(n**2), which means that (roughly speaking) if you increase 
the amount of data by a hundred, the time taken will increase by a factor of 
ten thousand.

You can read more about why this happens here:

http://www.joelonsoftware.com/articles/fog0000000319.html


CPython (the implementation you are using) has a clever optimization that 
*sometimes* can speed up this situation, which is why you may never have 
noticed how slow it gets. But other implementations such as Jython and 
IronPython do not, and so your code will be pathologically slow on these 
implementations.

Worse, the clever optimization is easily defeated. On some operating systems 
or memory schemes, it can fail and become horribly slow -- and debugging it is 
a real pain because others will report no slowdown.

A few years ago, a similar situation was reported in the urllib or urllib2 
module in the standard library. Thanks to the clever optimization, most people 
never noticed, but one user reported that Python was taken twenty or thirty 
minutes to download a file that Internet Explorer and wget would download in 
five or ten seconds. At first nobody believed him, because they couldn't 
replicate the bug. Then they thought it was a network issue. Eventually this 
fellow persevered and tracked the bug down to repeated string concatenation in 
the standard library. The inventor of Python, Guido van Rossum, described it 
as "embarrassing".

You can search the Python-Dev mailing list archives for this.


Here's an example of how slow repeated string concatenation can be, with and 
without the clever optimization:


py> from timeit import Timer
py> t = Timer('for i in range(500): s = s + "x"', 's = ""')
py> t.timeit(300)  # repeat the test 300 times
0.038927078247070312

That's not too bad: less than half a second to do 150 thousand string 
concatenations. But see what happens when I defeat the optimizer with a small 
change to the code:


py> t = Timer('for i in range(500): s = "x" + s', 's = ""')
py> t.timeit(300)
5.8992829322814941

That's 152 times slower.



-- 
Steven



More information about the Tutor mailing list