[Python-Dev] Optimized string concatenation
arigo at tunes.org
Tue Aug 3 10:54:01 CEST 2004
The SF patch http://www.python.org/sf/980695 about making repeated string
concatenations efficient has been reviewed and is acceptable on technical
grounds. This is about avoiding the quadratic behavior of
s = ''
for x in y:
s += some_string(x)
This leaves open the policy questions:
* first, is that an implementation detail or a published feature?
The question is important because the difference in performance is enormous
-- we are not talking about 2x or even 10x faster but roughly Nx faster
where N is the size of the input data set.
* if it is a published feature, what about Jython?
* The patch would encourage a coding style that gives program that essentially
don't scale with Jython -- nor, for that matter, with 2.3 or older -- and
worse, the programs would *appear* to work on Jython or 2.3 when tested with
small or medium-sized data sets, but just appear to hang when run on larger
data sets. Obviously, this is problem that has always been here, but if we
fix it in 2.4 we can be sure that people will develop and test with 2.4,
and less thoroughly on 2.3, and when they deploy on 2.3 platforms it will
unexpectedly not scale.
* discussed on SF too is whether we should remove the 'a=a+b' acceleration
from the patch, keeping only 'a+=b'; see the SF tracker.
This seems overkill, but should the acceleration be there but disabled by
from __future__ import string_concatenate?
More information about the Python-Dev