[Python-Dev] efficient string concatenation (yep, from 2004)

Christian Tismer tismer at stackless.com
Wed Feb 13 13:39:58 CET 2013


On 13.02.13 13:10, Steven D'Aprano wrote:
> On 13/02/13 10:53, Christian Tismer wrote:
>> Hi friends,
>>
>> _efficient string concatenation_ has been a topic in 2004.
>> Armin Rigo proposed a patch with the name of the subject,
>> more precisely:
>>
>> /[Patches] [ python-Patches-980695 ] efficient string concatenation//
>> //on sourceforge.net, on 2004-06-28.//
>> /
>> This patch was finally added to Python 2.4 on 2004-11-30.
>>
>> Some people might remember the larger discussion if such a patch 
>> should be
>> accepted at all, because it changes the programming style for many of us
>> from "don't do that, stupid" to "well, you may do it in CPython", 
>> which has quite
>> some impact on other implementations (is it fast on Jython, now?).
>
> I disagree. If you look at the archives on the python-list@ and 
> tutor at python.org
> mailing lists, you will see that whenever string concatenation comes 
> up, the common
> advice given is to use join.
>
> The documentation for strings is also clear that you should not rely 
> on this
> optimization:
>
> http://docs.python.org/2/library/stdtypes.html#typesseq
>
> And quadratic performance for repeated concatenation is not unique to 
> Python:
> it applies to pretty much any language with immutable strings, 
> including Java,
> C++, Lua and Javascript.
>
>
>> It changed for instance my programming and teaching style a lot, of 
>> course!
>
> Why do you say, "Of course"? It should not have changed anything.

You are right, I was actually over the top with my rant and never recommend
string concatenation when working with real amounts of data.
The surprise was just so big.

I tend to use whatever fits best for small initialization of some modules,
where the fact that concat is cheap lets me stop thinking of big Oh.
Although it probably does not matter much, it makes me feel incomfortable
to do something with potentially bad asymptotics.

>
> Best practice remains the same:
>
> - we should still use join for repeated concatenations;
>
> - we should still avoid + except for small cases which are not 
> performance critical;
>
> - we should still teach beginners to use join;
>
> - while this optimization is nice to have, we cannot rely on it being 
> there
>   when it matters.

I agree that CPython does say this clearly.
Actually I was complaining about the PyPy documentation which does not
mention this, and because PyPy is so very compatible already.

2004 when this stuff came up was the time where PyPy already was
quite active, but the Psyco mindset was still around, too.
Maybe my slightly shocked reaction originates from there, and my
implicit assumption was never corrected ;-)

cheers - chris

-- 
Christian Tismer             :^)   <mailto:tismer at stackless.com>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship* http://starship.python.net/
14482 Potsdam                :     PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?   http://www.stackless.com/



More information about the Python-Dev mailing list