[Python-Dev] Optimized string concatenation

Raymond Hettinger python at rcn.com
Tue Aug 3 04:48:31 CEST 2004


> > The SF patch http://www.python.org/sf/980695 about making repeated
> string
> > concatenations efficient has been reviewed and is acceptable on
> technical
> > grounds.  This is about avoiding the quadratic behavior of
> >
> > s = ''
> > for x in y:
> >   s += some_string(x)
> 
> FWIW, I didn't approve it, Raymond did, IMO prematurely. :-(
> 
> In the past similar schemes based on refcount were rejected because we
> couldn't be sure that there wasn't C code with an uncounted
> reference.  Is that ruled out in this scheme?

Yes it is.  Unlike the refcount check proposed for PyList_Sort, Armin's
patch only applies to the eval loop and cannot be called directly.

I spent a *long* time reviewing and testing this patch.  The code is
clean.  The concept works.  It delivers significant, measurable
benefits.  The standard library itself has existing use cases.  Every
benchmark I tried showed results.  The patch was not approved
prematurely!

On the off chance that Armin, Michael, and I missed a case, then
including the patch in the second alpha is the best way to find it.



>  I am
> going to make up a new rule here, and state that implementation
> features that affect not just the running speed but the O() rate for
> certain algorithms should be considered language features, because
> any implementation will be required to implement them in order to
> ensure code portability.

Even list.sort() does not guarantee specific O() rates.  Currently, the
fastest way to write a merge(a,b) is sort(a+b).  That relies on the
current implementation recognizing that a and b are already sorted and
doing a linear time concatenation.



> Why do people want this so badly? 

Because the a=a+b form occurs everywhere, not just in newbie code.  It
is in the standard library, it shows up naturally in SAX, and it is
often the most clear way to express an idea.



Raymond



More information about the Python-Dev mailing list