[Python-Dev] sum(...) limitation

Chris Barker chris.barker at noaa.gov
Fri Aug 8 17:23:51 CEST 2014


On Thu, Aug 7, 2014 at 4:01 PM, Ethan Furman <ethan at stoneleaf.us> wrote:

> I don't remember where, but I believe that cPython has an optimization
> built in for repeated string concatenation, which is probably why you
> aren't seeing big differences between the + and the sum().
>

Indeed -- clearly so.

A little testing shows how to defeat that optimization:

  blah = ''
>   for string in ['booyah'] * 100000:
>       blah = string + blah
>
> Note the reversed order of the addition.
>

thanks -- cool trick.

Oh, and the join() timings:
> --> timeit.Timer("blah = ''.join(['booya'] * 100000)", "blah =
> ''").repeat(3, 1)
> [0.0014629364013671875, 0.0014190673828125, 0.0011930465698242188]
> So, + is three orders of magnitude slower than join.


only one if if you use the optimized form of + and not even that if you
need to build up the list first, which is the common use-case.

So my final question is this:

repeated string concatenation is not the "recommended" way to do this --
but nevertheless, cPython has an optimization that makes it fast and
efficient, to the point that there is no practical performance reason to
prefer appending to a list and calling join()) afterward.

So why not apply a similar optimization to sum() for strings?

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140808/7428da91/attachment.html>


More information about the Python-Dev mailing list