On Sat, Aug 02, 2014 at 05:39:12PM +1000, Steven D'Aprano wrote:
Repeated list and str concatenation both have quadratic O(N**2) performance, but people frequently build up strings with + and rarely do the same for lists. String concatenation with + is an attractive nuisance for many people, including some who actually know better but nevertheless do it. Also, for reasons I don't understand, many people dislike or cannot remember to use ''.join.
join() isn't preferable in cases where it damages readability while simultaneously providing zero or negative performance benefit, such as when concatenating a few short strings, e.g. while adding a prefix to a filename.
Although it's true that join() is automatically the safer option, and especially when dealing with user supplied data, the net harm caused by teaching rote and ceremony seems far less desirable compared to fixing a trivial slowdown in a script, if that slowdown ever became apparent.
Another (twisted) interpretation is that since the quadratic behaviour is a CPython implementation detail, and there are alternatives where __add__ is constant time, encouraging users to code against implementation details becomes undesirable. In our twisty world, __add__ becomes *preferable* since the resulting programs more closely resemble pseudo-code.
$ cat t.py a = 'this ' b = 'is a string' c = 'as we can tell'
def x(): return a + b + c
def y(): return ''.join([a, b, c])
$ python -m timeit -s 'import t' 't.x()' 1000000 loops, best of 3: 0.477 usec per loop
$ python -m timeit -s 'import t' 't.y()' 1000000 loops, best of 3: 0.695 usec per loop