RE: [Python-Dev] Optimized string concatenation

Aug. 3, 2004

      ...
...
The SF patch http://www.python.org/sf/980695 about making repeated
string
concatenations efficient has been reviewed and is acceptable on
technical
grounds.  This is about avoiding the quadratic behavior of
s = ''
for x in y:
  s += some_string(x)
FWIW, I didn't approve it, Raymond did, IMO prematurely. :-(
In the past similar schemes based on refcount were rejected because we
couldn't be sure that there wasn't C code with an uncounted
reference.  Is that ruled out in this scheme?
Yes it is.  Unlike the refcount check proposed for PyList_Sort, Armin's
patch only applies to the eval loop and cannot be called directly.

I spent a *long* time reviewing and testing this patch.  The code is
clean.  The concept works.  It delivers significant, measurable
benefits.  The standard library itself has existing use cases.  Every
benchmark I tried showed results.  The patch was not approved
prematurely!

On the off chance that Armin, Michael, and I missed a case, then
including the patch in the second alpha is the best way to find it.
...
I am
going to make up a new rule here, and state that implementation
features that affect not just the running speed but the O() rate for
certain algorithms should be considered language features, because
any implementation will be required to implement them in order to
ensure code portability.
Even list.sort() does not guarantee specific O() rates.  Currently, the
fastest way to write a merge(a,b) is sort(a+b).  That relies on the
current implementation recognizing that a and b are already sorted and
doing a linear time concatenation.
...
Why do people want this so badly?
Because the a=a+b form occurs everywhere, not just in newbie code.  It
is in the standard library, it shows up naturally in SAX, and it is
often the most clear way to express an idea.

Raymond

RE: [Python-Dev] Optimized string concatenation

Raymond Hettinger