On Wed, Feb 13, 2013 at 10:06 PM, Christian Tismer firstname.lastname@example.org wrote:
To avoid such hidden traps in larger code bases, documentation is needed that clearly gives a warning saying "don't do that", like CS students learn for most other languages.
How much more explicit do you want us to be?
"""6. CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations."""
So please don't blame us for people not reading a warning that is already there.
Since my rewrite of the sequence docs, Python 3 doesn't even acknowledge the hack's existence and is quite explicit about what you need to do to get reliably linear behaviour:
"""6. Concatenating immutable sequences always results in a new object. This means that building up a sequence by repeated concatenation will have a quadratic runtime cost in the total sequence length. To get a linear runtime cost, you must switch to one of the alternatives below:
if concatenating str objects, you can build a list and use str.join() at the end or else write to a io.StringIO instance and retrieve its value when complete if concatenating bytes objects, you can similarly use bytes.join() or io.BytesIO, or you can do in-place concatenation with a bytearray object. bytearray objects are mutable and have an efficient overallocation mechanism if concatenating tuple objects, extend a list instead for other types, investigate the relevant class documentation"""
Deliberately *relying* on the += hack to avoid quadratic runtime is just plain wrong, and our documentation already says so.
If anyone really thinks it will help, I can add a CPython implementation note back in to the Python 3 docs as well, pointing out that CPython performance measurements may hide broken algorithmic complexity related to string concatenation, but the corresponding note in Python 2 doesn't seem to have done much good :P