I've significantly enhanced my string-concatenation patch, to the point where that name is no longer accurate.  So I've redubbed it the "lazy strings" patch.

The major new feature is that string *slices* are also represented with a lazy-evaluation placeholder for the actual string, just as concatenated strings were in my original patch.  The lazy slice object stores a reference to the original PyStringObject * it is sliced from, and the desired start and stop slice markers.  (It only supports step = 1.)  Its ob_sval is NULL until the string is rendered--but that rarely happens!  Not only does this mean string slices are faster, but I bet this generally reduces overall memory usage for slices too.

Now, one rule of the Python programming API is that "all strings are zero-terminated".  That part of makes the life of a Python extension author sane--they don't have to deal with some exotic Python string class, they can just assume C-style strings everywhere.  Ordinarily, this means a string slice couldn't simply point into the original string; if it did, and you executed
  x = "abcde"
  y = x[1:4]
internally y->ob_sval[3] would not be 0, it would be 'e', breaking the API's rule about strings.

However!  When a PyStringObject lives out its life purely within the Python VM, the only code that strenuously examines its internals is stringobject.c.  And that code almost never needs the trailing zero*.  So I've added a new static method in stringobject.c:
    char * PyString_AsUnterminatedString(PyStringObject *)
If you call it on a lazy-evaluation slice object, it gives you back a pointer into the original string's ob_sval.  The s->ob_size'th element of this *might not* be zero, but if you call this function you're saying that's a-okay, you promise not to look at it.  (If the PyStringObject * is any other variety, it calls into PyString_AsString, which renders string concatenation objects then returns ob_sval.)

Again: this behavior is *never* observed by anyone outside of stringobject.c.  External users of PyStringObjects call PyString_AS_STRING(), which renders all lazy concatenation and lazy slices so they look just like normal zero-terminated PyStringObjects.  With my patch applied, trunk still passes all expected tests.

Of course, lazy slice objects aren't just for literal slices created with [x:y].  There are lots of string methods that return what are effectively string slices, like lstrip() and split().

With this code in place, string slices that aren't examined by modules are very rarely rendered.  I ran "pybench -n 2" (two rounds, warp 10 (whatever that means)) while collecting some statistics.  When it finished, the interpreter had created a total of 640,041 lazy slices, of which only *19* were ever rendered.

Apart from lazy slices, there's only one more enhancement when compared with v1: string prepending now reuses lazy concatenation objects much more often. There was an optimization in string_concatenate (Python/ceval.c) that said: "if the left-side string has two references, and we're about to overwrite the second reference by storing this concatenation to an object, tell that object to drop its reference".  That often meant the reference on the string dropped to 1, which meant PyString_Resize could just resize the left-side string in place and append the right-side.  I modified it so it drops the reference to the right-hand operand too.  With this change, even with a reduction in the allowable stack depth for right-hand recursion (so it's less likely to blow the stack), I was able to prepend over 86k strings before it forced a render.  (Oh, for the record: I ensure depth limits are enforced when combining lazy slices and lazy concatenations, so you still won't blow your stack when you mix them together.)

Here are the highlights of a single apples-to-apples pybench run, 2.6 trunk revision 52413 ("this") versus that same revision with my patch applied ("other"):

Test                             minimum run-time        average  run-time
                                 this    other   diff    this    other   diff
                 ConcatStrings:   204ms    76ms +168.4%   213ms    77ms +177.7%
       CreateStringsWithConcat:   159ms   138ms  +15.7%   163ms   142ms  +15.1%
                 StringSlicing:   142ms    86ms  +65.5%   145ms    88ms  +64.6%
Totals:                          7976ms  7713ms   +3.4%  8257ms  7975ms   +3.5%

I also ran this totally unfair benchmark:
    x = "abcde" * (20000) # 100k characters
    for i in xrange(10000000):
        y = x[1:-1]
and found my patched version to be 9759% faster.  (You heard that right, 98x faster.)

I'm ready to post the patch.  However, as a result of this work, the description on the original patch page is really no longer accurate:
Shall I close/delete that patch and submit a new patch with a more modern description?  After all, there's not a lot of activity on the old patch page...



* As I recall, stringobject.c needs the trailing zero in exactly *one* place: when comparing two zero-length strings.  My patch ensures that zero-length slices and concatenations still return nullstring, so this still works as expected.