[Python-Dev] Usage of += on strings in loops in stdlib

Terry Reedy tjreedy at udel.edu
Wed Feb 13 02:58:53 CET 2013

On 2/12/2013 4:03 PM, Maciej Fijalkowski wrote:
> Hi
> We recently encountered a performance issue in stdlib for pypy. It
> turned out that someone commited a performance "fix" that uses += for
> strings instead of "".join() that was there before.
> Now this hurts pypy (we can mitigate it to some degree though) and
> possible Jython and IronPython too.
> How people feel about generally not having += on long strings in
> stdlib (since the refcount = 1 thing is a hack)?
> What about other performance improvements in stdlib that are
> problematic for pypy or others?
> Personally I would like cleaner code in stdlib vs speeding up CPython.
> Typically that also helps pypy so I'm not unbiased.

I agree. sum() refuses to sum strings specifically to encourage .join().

 >>> sum(('x', 'b'), '')
Traceback (most recent call last):
   File "<pyshell#0>", line 1, in <module>
     sum(('x', 'b'), '')
TypeError: sum() can't sum strings [use ''.join(seq) instead]

The doc entry for sum says the same thing.

Terry Jan Reedy

