[Python-Dev] efficient string concatenation (yep, from 2004)

Christian Tismer tismer at stackless.com
Wed Feb 13 18:07:22 CET 2013


Hey Nick,

On 13.02.13 15:44, Nick Coghlan wrote:
> On Wed, Feb 13, 2013 at 10:06 PM, Christian Tismer <tismer at stackless.com> wrote:
>> To avoid such hidden traps in larger code bases, documentation is
>> needed that clearly gives a warning saying "don't do that", like CS
>> students learn for most other languages.
> How much more explicit do you want us to be?
>
> """6. CPython implementation detail: If s and t are both strings, some
> Python implementations such as CPython can usually perform an in-place
> optimization for assignments of the form s = s + t or s += t. When
> applicable, this optimization makes quadratic run-time much less
> likely. This optimization is both version and implementation
> dependent. For performance sensitive code, it is preferable to use the
> str.join() method which assures consistent linear concatenation
> performance across versions and implementations."""
>
> from http://docs.python.org/2/library/stdtypes.html#typesseq
>
> So please don't blame us for people not reading a warning that is already there.

I don't, really not. This was a cross-posting effect.
I was using the PyPy documentation, only, and there a lot of things
are mentioned, but this behavioral difference was missing.
Python-dev was not addressed at all.

> ...
> Deliberately *relying* on the += hack to avoid quadratic runtime is
> just plain wrong, and our documentation already says so.
>
> If anyone really thinks it will help, I can add a CPython
> implementation note back in to the Python 3 docs as well, pointing out
> that CPython performance measurements may hide broken algorithmic
> complexity related to string concatenation, but the corresponding note
> in Python 2 doesn't seem to have done much good :P
>

Well, while we are at it:
Yes, it says so as a note at the end of
http://docs.python.org/2/library/stdtypes.html#typesseq

I doubt that many people read that far, and they do not search 
documentation about
sequence types when they are adding some strings together.
People seem to have a tendency to just try something out instead and see 
if it
works. That even seems to get worse the better and bigger the Python 
documentation
grows. ;-)

Maybe it would be a good idea to remove that concat optimization completely?
Then people will wake up and read the docs to find out what's wrong ;-)
No, does not help, because their test cases will not cover the reality.

-----
Thinking a bit more about it.

If you think about docs improvement, I don't believe it helps to make
the very complete reference documentation even more complete.
Completeness is great, don't take me wrong! But what people read
is what pops right into their face, and I think that could be added.

I think before getting people to work through long and
complete documentation, it is probably easier to wake their interest
by something like
"Hey, are you doing things this way?"
And then there is a short, concise list of bad and good things, maybe
even dynamic as in WingWare's "Wing Tips" or any better approach.

 From that easily reachable, only a few pages long tabular
collection of short hints and caveats there could be linkage to the 
existing, real
documentation that explains things in more detail.
Maybe that could be a way to get people to actually read.

Just an idea.

cheers - Chris


p.s.:
Other nice attempts that don't seem to really work:

Some hints like
http://docs.python.org/2/howto/doanddont.html
are not bad, although that is hidden in the HowTO section, does only
address a few things,
and also the sub-title "in-depth documents on specific topics" is not
what they seek in the first place while hacking on some code.

Looking things up in a quick ref like
http://rgruet.free.fr/PQR27/PQR2.7.html
is very concise but does also _not_ mention what to avoid.
Others exist, like
http://infohost.nmt.edu/tcc/help/pubs/python/web/

By the way, the first thing I find via google is:
http://www.python.org/doc/QuickRef.html
which is quite funny (v1.3)

-- 
Christian Tismer             :^)   <mailto:tismer at stackless.com>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship* http://starship.python.net/
14482 Potsdam                :     PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?   http://www.stackless.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130213/ecdf74da/attachment-0001.html>


More information about the Python-Dev mailing list