[Tutor] Iterate Suggestion
Steven D'Aprano
steve at pearwood.info
Sun Apr 15 04:40:29 CEST 2012
Bod Soutar wrote:
> How about something like this
>
> mylist = ['serverA', 'serverB', 'serverC', 'serverD','serverE', 'serverF',
> 'serverG']
> tempstr = ""
> count = 0
>
> for item in mylist:
> count += 1
> if count == 3:
> tempstr += (i + "\n")
> count = 0
> else:
> tempstr += (i + " ")
>
> print tempstr
Warning: this is a good way to write HORRIBLY slow code that potentially takes
many minutes or even hours to generate output. And even worst, it will occur
inconsistently, making it really hard to debug.
The right way to join many strings into one is with the join method:
accumulate the substrings into a list, and then join them in one go:
' '.join(list_of_words)
The problem with your code is that you are doing repeated string
concatenation, which is slow. If you understand Big Oh notation, string
concatenation is O(n**2), which means that (roughly speaking) if you increase
the amount of data by a hundred, the time taken will increase by a factor of
ten thousand.
You can read more about why this happens here:
http://www.joelonsoftware.com/articles/fog0000000319.html
CPython (the implementation you are using) has a clever optimization that
*sometimes* can speed up this situation, which is why you may never have
noticed how slow it gets. But other implementations such as Jython and
IronPython do not, and so your code will be pathologically slow on these
implementations.
Worse, the clever optimization is easily defeated. On some operating systems
or memory schemes, it can fail and become horribly slow -- and debugging it is
a real pain because others will report no slowdown.
A few years ago, a similar situation was reported in the urllib or urllib2
module in the standard library. Thanks to the clever optimization, most people
never noticed, but one user reported that Python was taken twenty or thirty
minutes to download a file that Internet Explorer and wget would download in
five or ten seconds. At first nobody believed him, because they couldn't
replicate the bug. Then they thought it was a network issue. Eventually this
fellow persevered and tracked the bug down to repeated string concatenation in
the standard library. The inventor of Python, Guido van Rossum, described it
as "embarrassing".
You can search the Python-Dev mailing list archives for this.
Here's an example of how slow repeated string concatenation can be, with and
without the clever optimization:
py> from timeit import Timer
py> t = Timer('for i in range(500): s = s + "x"', 's = ""')
py> t.timeit(300) # repeat the test 300 times
0.038927078247070312
That's not too bad: less than half a second to do 150 thousand string
concatenations. But see what happens when I defeat the optimizer with a small
change to the code:
py> t = Timer('for i in range(500): s = "x" + s', 's = ""')
py> t.timeit(300)
5.8992829322814941
That's 152 times slower.
--
Steven
More information about the Tutor
mailing list