Re: [Python-Dev] Optimized string concatenation

Aug. 3, 2004

      ...
The SF patch http://www.python.org/sf/980695 about making repeated string
concatenations efficient has been reviewed and is acceptable on technical
grounds.  This is about avoiding the quadratic behavior of
s = ''
for x in y:
  s += some_string(x)
FWIW, I didn't approve it, Raymond did, IMO prematurely. :-(

In the past similar schemes based on refcount were rejected because we
couldn't be sure that there wasn't C code with an uncounted
reference.  Is that ruled out in this scheme?

Also, I believe that even Java doesn't optimize this case in general
-- only the much simpler case where you concatenate a number of
strings in a single expression.
...
This leaves open the policy questions:
* first, is that an implementation detail or a published feature?
  The question is important because the difference in performance is
  enormous -- we are not talking about 2x or even 10x faster but
  roughly Nx faster where N is the size of the input data set.
Like tail recursion, it will change the way people write code.  I am
going to make up a new rule here, and state that implementation
features that affect not just the running speed but the O() rate for
certain algorithms should be considered language features, because
any implementation will be required to implement them in order to
ensure code portability.
...
* if it is a published feature, what about Jython?
* The patch would encourage a coding style that gives program that
  essentially don't scale with Jython -- nor, for that matter, with
  2.3 or older -- and worse, the programs would *appear* to work on
  Jython or 2.3 when tested with small or medium-sized data sets,
  but just appear to hang when run on larger data sets.  Obviously,
  this is problem that has always been here, but if we fix it in 2.4
  we can be sure that people will develop and test with 2.4, and
  less thoroughly on 2.3, and when they deploy on 2.3 platforms it
  will unexpectedly not scale.
* discussed on SF too is whether we should remove the 'a=a+b'
  acceleration from the patch, keeping only 'a+=b'; see the SF
  tracker.
This seems overkill, but should the acceleration be there but
disabled by default?
from __future__ import string_concatenate?
Why do people want this so badly?  Please leave well enough alone.

--Guido van Rossum (home page: http://www.python.org/~guido/)