[Python-Dev] efficient string concatenation (yep, from 2004)

Christian Tismer tismer at stackless.com
Wed Feb 13 13:06:27 CET 2013


On 13.02.13 08:42, Lennart Regebro wrote:
>> Something is needed - a patch for PyPy or for the documentation I guess.
> Not arguing that it wouldn't be good, but I disagree that it is needed.
>
> This is only an issue when you, as in your proof, have a loop that
> does concatenation. This is usually when looping over a list of
> strings that should be concatenated together. Doing so in a loop with
> concatenation may be the natural way for people new to Python, but the
> "natural" way to do it in Python is with a ''.join() call.
>
> This:
>
>      s = ''.join(('X' for x in xrange(x)))
>
> Is more than twice as fast in Python 2.7 than your example. It is in
> fact also slower in PyPy 1.9 than Python 2.7, but only with a factor
> of two:
>
> Python 2.7:
> time for 10000000 concats = 0.887
> Pypy 1.9:
> time for 10000000 concats = 1.600
>
> (And of course s = 'X'* x takes only a bout a hundredth of the time,
> but that's cheating. ;-)
>

This is not about how to write efficient concatenation and not
for me. It is also not about a constant factor, which I don't really
care about but in situations where speed matters.

This is about a possible algorithmic trap, where code written for
CPython may behave well with some roughly O(n) behavior,
and by switching to PyPy you get a surprise when the same
code now has O(n**2) behavior. Such runtime explosions can damage
the trust in PyPy, with code sitting in some module which you even
did not write but "pip install"-ed it.

So this is important to know, especially for newcomers, and for people
who are giving advice to them.
For algorithmic compatibility, there should no longer
be a feature with this drastic side effect, if that cannot be supported by
all other dialects.

To avoid such hidden traps in larger code bases, documentation is
needed that clearly gives a warning saying "don't do that", like CS
students learn for most other languages.

cheers - chris

-- 
Christian Tismer             :^)   <mailto:tismer at stackless.com>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship* http://starship.python.net/
14482 Potsdam                :     PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?   http://www.stackless.com/



More information about the Python-Dev mailing list