[Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]

21 Oct 2006

      I've significantly enhanced my string-concatenation patch, to the point 
where that name is no longer accurate.  So I've redubbed it the "lazy 
strings" patch.

The major new feature is that string *slices* are also represented with 
a lazy-evaluation placeholder for the actual string, just as 
concatenated strings were in my original patch.  The lazy slice object 
stores a reference to the original PyStringObject * it is sliced from, 
and the desired start and stop slice markers.  (It only supports step = 
1.)  Its ob_sval is NULL until the string is rendered--but that rarely 
happens!  Not only does this mean string slices are faster, but I bet 
this generally reduces overall memory usage for slices too.

Now, one rule of the Python programming API is that "all strings are 
zero-terminated".  That part of makes the life of a Python extension 
author sane--they don't have to deal with some exotic Python string 
class, they can just assume C-style strings everywhere.  Ordinarily, 
this means a string slice couldn't simply point into the original 
string; if it did, and you executed
  x = "abcde"
  y = x[1:4]
internally y->ob_sval[3] would not be 0, it would be 'e', breaking the 
API's rule about strings.

However!  When a PyStringObject lives out its life purely within the 
Python VM, the only code that strenuously examines its internals is 
stringobject.c.  And that code almost never needs the trailing zero*.  
So I've added a new static method in stringobject.c:
    char * PyString_AsUnterminatedString(PyStringObject *)
If you call it on a lazy-evaluation slice object, it gives you back a 
pointer into the original string's ob_sval.  The s->ob_size'th element 
of this *might not* be zero, but if you call this function you're saying 
that's a-okay, you promise not to look at it.  (If the PyStringObject * 
is any other variety, it calls into PyString_AsString, which renders 
string concatenation objects then returns ob_sval.)

Again: this behavior is *never* observed by anyone outside of 
stringobject.c.  External users of PyStringObjects call 
PyString_AS_STRING(), which renders all lazy concatenation and lazy 
slices so they look just like normal zero-terminated PyStringObjects.  
With my patch applied, trunk still passes all expected tests.

Of course, lazy slice objects aren't just for literal slices created 
with [x:y].  There are lots of string methods that return what are 
effectively string slices, like lstrip() and split().

With this code in place, string slices that aren't examined by modules 
are very rarely rendered.  I ran "pybench -n 2" (two rounds, warp 10 
(whatever that means)) while collecting some statistics.  When it 
finished, the interpreter had created a total of 640,041 lazy slices, of 
which only *19* were ever rendered.

Apart from lazy slices, there's only one more enhancement when compared 
with v1: string prepending now reuses lazy concatenation objects much 
more often. There was an optimization in string_concatenate 
(Python/ceval.c) that said: "if the left-side string has two references, 
and we're about to overwrite the second reference by storing this 
concatenation to an object, tell that object to drop its reference".  
That often meant the reference on the string dropped to 1, which meant 
PyString_Resize could just resize the left-side string in place and 
append the right-side.  I modified it so it drops the reference to the 
right-hand operand too.  With this change, even with a reduction in the 
allowable stack depth for right-hand recursion (so it's less likely to 
blow the stack), I was able to prepend over 86k strings before it forced 
a render.  (Oh, for the record: I ensure depth limits are enforced when 
combining lazy slices and lazy concatenations, so you still won't blow 
your stack when you mix them together.)

Here are the highlights of a single apples-to-apples pybench run, 2.6 
trunk revision 52413 ("this") versus that same revision with my patch 
applied ("other"):

Test                             minimum run-time        average  run-time
                                 this    other   diff    this    other   
diff
-------------------------------------------------------------------------------
                 ConcatStrings:   204ms    76ms +168.4%   213ms    77ms 
+177.7%
       CreateStringsWithConcat:   159ms   138ms  +15.7%   163ms   142ms  
+15.1%
                 StringSlicing:   142ms    86ms  +65.5%   145ms    88ms  
+64.6%
-------------------------------------------------------------------------------
Totals:                          7976ms  7713ms   +3.4%  8257ms  
7975ms   +3.5%

I also ran this totally unfair benchmark:
    x = "abcde" * (20000) # 100k characters
    for i in xrange(10000000):
        y = x[1:-1]
and found my patched version to be 9759% faster.  (You heard that right, 
98x faster.)

I'm ready to post the patch.  However, as a result of this work, the 
description on the original patch page is really no longer accurate:

http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470
Shall I close/delete that patch and submit a new patch with a more 
modern description?  After all, there's not a lot of activity on the old 
patch page...

Cheers,

/larry/

* As I recall, stringobject.c needs the trailing zero in exactly *one* 
place: when comparing two zero-length strings.  My patch ensures that 
zero-length slices and concatenations still return nullstring, so this 
still works as expected.

[Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]

Larry Hastings