
On Mon, Aug 04, 2014 at 09:25:12AM -0700, Chris Barker wrote:
Good point -- I was trying to make the point about .join() vs + for strings in an intro python class last year, and made the mistake of having the students test the performance.
You need to concatenate a LOT of strings to see any difference at all -- I know that O() of algorithms is unavoidable, but between efficient python optimizations and a an apparently good memory allocator, it's really a practical non-issue.
If only that were the case, but it isn't. Here's a cautionary tale for how using string concatenation can blow up in your face: Chris Withers asks for help debugging HTTP slowness: https://mail.python.org/pipermail/python-dev/2009-August/091125.html and publishes some times: https://mail.python.org/pipermail/python-dev/2009-September/091581.html (notice that Python was SIX HUNDRED times slower than wget or IE) and Simon Cross identified the problem: https://mail.python.org/pipermail/python-dev/2009-September/091582.html leading Guido to describe the offending code as an embarrassment. It shouldn't be hard to demonstrate the difference between repeated string concatenation and join, all you need do is defeat sum()'s prohibition against strings. Run this bit of code, and you'll see a significant difference in performance, even with CPython's optimized concatenation: # --- cut --- class Faker: def __add__(self, other): return other x = Faker() strings = list("Hello World!") assert ''.join(strings) == sum(strings, x) from timeit import Timer setup = "from __main__ import x, strings" t1 = Timer("''.join(strings)", setup) t2 = Timer("sum(strings, x)", setup) print (min(t1.repeat())) print (min(t2.repeat())) # --- cut --- On my computer, using Python 2.7, I find the version using sum is nearly 4.5 times slower, and with 3.3 about 4.2 times slower. That's with a mere twelve substrings, hardly "a lot". I tried running it on IronPython with a slightly larger list of substrings, but I got sick of waiting for it to finish. If you want to argue that microbenchmarks aren't important, well, I might agree with you in general, but in the specific case of string concatenation there's that pesky factor of 600 slowdown in real world code to argue with.
Blocking sum( some_strings) because it _might_ have poor performance seems awfully pedantic.
The rationale for explicitly prohibiting strings while merely implicitly discouraging other non-numeric types is that beginners, who are least likely to understand why their code occasionally and unpredictably becomes catastrophically slow, are far more likely to sum strings than sum tuples or lists. (I don't entirely agree with this rationale, I'd prefer a warning rather than an exception.) -- Steven