[Python-Dev] Optimized string concatenation

Guido van Rossum guido at python.org
Tue Aug 3 16:13:30 CEST 2004


> The SF patch http://www.python.org/sf/980695 about making repeated string
> concatenations efficient has been reviewed and is acceptable on technical
> grounds.  This is about avoiding the quadratic behavior of
> 
> s = ''
> for x in y:
>   s += some_string(x)

FWIW, I didn't approve it, Raymond did, IMO prematurely. :-(

In the past similar schemes based on refcount were rejected because we
couldn't be sure that there wasn't C code with an uncounted
reference.  Is that ruled out in this scheme?

Also, I believe that even Java doesn't optimize this case in general
-- only the much simpler case where you concatenate a number of
strings in a single expression.

> This leaves open the policy questions:
> 
> * first, is that an implementation detail or a published feature?
>   The question is important because the difference in performance is
>   enormous -- we are not talking about 2x or even 10x faster but
>   roughly Nx faster where N is the size of the input data set.

Like tail recursion, it will change the way people write code.  I am
going to make up a new rule here, and state that implementation
features that affect not just the running speed but the O() rate for
certain algorithms should be considered language features, because
any implementation will be required to implement them in order to
ensure code portability.

> * if it is a published feature, what about Jython?
> 
> * The patch would encourage a coding style that gives program that
>   essentially don't scale with Jython -- nor, for that matter, with
>   2.3 or older -- and worse, the programs would *appear* to work on
>   Jython or 2.3 when tested with small or medium-sized data sets,
>   but just appear to hang when run on larger data sets.  Obviously,
>   this is problem that has always been here, but if we fix it in 2.4
>   we can be sure that people will develop and test with 2.4, and
>   less thoroughly on 2.3, and when they deploy on 2.3 platforms it
>   will unexpectedly not scale.
> 
> * discussed on SF too is whether we should remove the 'a=a+b'
>   acceleration from the patch, keeping only 'a+=b'; see the SF
>   tracker.
> 
> This seems overkill, but should the acceleration be there but
> disabled by default?
> 
> from __future__ import string_concatenate?

Why do people want this so badly?  Please leave well enough alone.

--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list