[Tutor] Joining all strings in stringList into one string

Jordan wolfrage8765 at gmail.com
Sat Jun 2 23:20:48 CEST 2012


Thanks for the excellent feedback and very informative. I guess I just
did not consider the memory side of things and did not think about just
how much extra addition was having to occur.
Additionally I also found this supporting link:
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation


On 06/02/2012 04:29 PM, Steven D'Aprano wrote:
> Jordan wrote:
>
>> #Another version might look like this:
>>
>> def join_strings2(string_list):
>>     final_string = ''
>>     for string in string_list:
>>         final_string += string
>>     print(final_string)
>>     return final_string
>
> Please don't do that. This risks becoming slow. REALLY slow. Painfully
> slow. Like, 10 minutes versus 3 seconds slow. Seriously.
>
> The reason for this is quite technical, and the reason why you might
> not notice is even more complicated, but the short version is this:
>
> Never build up a long string by repeated concatenation of short strings.
> Always build up a list of substrings first, then use the join method to
> assemble them into one long string.
>
>
> Why repeated concatenation is slow:
>
> Suppose you want to concatenation two strings, "hello" and "world", to
> make a new string "helloworld". What happens?
>
> Firstly, Python has to count the number of characters needed, which
> fortunately is fast in Python, so we can ignore it. In this case, we
> need 5+5 = 10 characters.
>
> Secondly, Python sets aside enough memory for those 10 characters,
> plus a little bit of overhead: ----------
>
> Then it copies the characters from "hello" into the new area: hello-----
>
> followed by the characters of "world": helloworld
>
> and now it is done. Simple, right? Concatenating two strings is pretty
> fast. You can't get much faster.
>
> Ah, but what happens if you do it *repeatedly*? Suppose we have SIX
> strings we want to concatenate, in a loop:
>
> words = ['hello', 'world', 'foo', 'bar', 'spam', 'ham']
> result = ''
> for word in words:
>     result = result + word
>
> How much work does Python have to do?
>
> Step one: add '' + 'hello', giving result 'hello'
> Python needs to copy 0+5 = 5 characters.
>
> Step two: add 'hello' + 'world', giving result 'helloworld'
> Python needs to copy 5+5 = 10 characters, as shown above.
>
> Step three: add 'helloworld' + 'foo', giving 'helloworldfoo'
> Python needs to copy 10+3 = 13 characters.
>
> Step four: add 'helloworldfoo' + 'bar', giving 'helloworldfoobar'
> Python needs to copy 13+3 = 16 characters.
>
> Step five: add 'helloworldfoobar' + 'spam', giving 'helloworldfoobarspam'
> Python needs to copy 16+4 = 20 characters.
>
> Step six: add 'helloworldfoobarspam' + 'ham', giving
> 'helloworldfoobarspamham'
> Python needs to copy 20+3 = 23 characters.
>
> So in total, Python has to copy 5+10+13+16+20+23 = 87 characters, just
> to build up a 23 character string. And as the number of loops
> increases, the amount of extra work needed just keeps expanding. Even
> though a single string concatenation is fast, repeated concatenation
> is painfully SLOW.
>
> In comparison, ''.join(words) one copies each substring once: it
> counts out that it needs 23 characters, allocates space for 23
> characters, then copies each substring into the right place instead of
> making a whole lot of temporary strings and redundant copying.
>
> So, join() is much faster than repeated concatenation. But you may
> never have noticed. Why not?
>
> Well, for starters, for small enough pieces of data, everything is
> fast. The difference between copying 87 characters (the slow way) and
> 23 characters (the fast way) is trivial.
>
> But more importantly, some years ago (Python 2.4, about 8 years ago?)
> the Python developers found a really neat trick that they can do to
> optimize string concatenation so it doesn't need to repeatedly copy
> characters over and over and over again. I won't go into details, but
> the thing is, this trick works well enough that repeated concatenation
> is about as fast as the join method MOST of the time.
>
> Except when it fails. Because it is a trick, it doesn't always work.
> And when it does fail, your repeated string concatenation code will
> suddenly drop from running in 0.1 milliseconds to a full second or
> two; or worse, from 20 seconds to over an hour. (Potentially; the
> actual slow-down depends on the speed of your computer, your operating
> system, how much memory you have, etc.)
>
> Because this is a cunning trick, it doesn't always work, and when it
> doesn't work, and you have slow code and no hint as to why.
>
> What can cause it to fail?
>
> - Old versions of Python, before 2.4, will be slow.
>
> - Other implementations of Python, such as Jython and IronPython, will
> not have the trick, and so will be slow.
>
> - The trick is highly-dependent on internal details of the memory
> management of Python and the way it interacts with the operating
> system. So what's fast under Linux may be slow under Windows, or the
> other way around.
>
> - The trick is highly-dependent on specific circumstances to do with
> the substrings being added. Without going into details, if those
> circumstances are violated, you will have slow code.
>
> - The trick only works when you are adding strings to the end of the
> new string, not if you are building it up from the beginning.
>
>
> So even though your function works, you can't rely on it being fast.
>
>
>
>


More information about the Tutor mailing list