
The SF patch http://www.python.org/sf/980695 about making repeated string concatenations efficient has been reviewed and is acceptable on technical grounds. This is about avoiding the quadratic behavior of
s = '' for x in y: s += some_string(x)
FWIW, I didn't approve it, Raymond did, IMO prematurely. :-(
In the past similar schemes based on refcount were rejected because we couldn't be sure that there wasn't C code with an uncounted reference. Is that ruled out in this scheme?
Yes it is. Unlike the refcount check proposed for PyList_Sort, Armin's patch only applies to the eval loop and cannot be called directly. I spent a *long* time reviewing and testing this patch. The code is clean. The concept works. It delivers significant, measurable benefits. The standard library itself has existing use cases. Every benchmark I tried showed results. The patch was not approved prematurely! On the off chance that Armin, Michael, and I missed a case, then including the patch in the second alpha is the best way to find it.
I am going to make up a new rule here, and state that implementation features that affect not just the running speed but the O() rate for certain algorithms should be considered language features, because any implementation will be required to implement them in order to ensure code portability.
Even list.sort() does not guarantee specific O() rates. Currently, the fastest way to write a merge(a,b) is sort(a+b). That relies on the current implementation recognizing that a and b are already sorted and doing a linear time concatenation.
Why do people want this so badly?
Because the a=a+b form occurs everywhere, not just in newbie code. It is in the standard library, it shows up naturally in SAX, and it is often the most clear way to express an idea. Raymond