[Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]
Larry Hastings
larry at hastings.org
Sat Oct 21 04:29:01 CEST 2006
I've significantly enhanced my string-concatenation patch, to the point
where that name is no longer accurate. So I've redubbed it the "lazy
strings" patch.
The major new feature is that string *slices* are also represented with
a lazy-evaluation placeholder for the actual string, just as
concatenated strings were in my original patch. The lazy slice object
stores a reference to the original PyStringObject * it is sliced from,
and the desired start and stop slice markers. (It only supports step =
1.) Its ob_sval is NULL until the string is rendered--but that rarely
happens! Not only does this mean string slices are faster, but I bet
this generally reduces overall memory usage for slices too.
Now, one rule of the Python programming API is that "all strings are
zero-terminated". That part of makes the life of a Python extension
author sane--they don't have to deal with some exotic Python string
class, they can just assume C-style strings everywhere. Ordinarily,
this means a string slice couldn't simply point into the original
string; if it did, and you executed
x = "abcde"
y = x[1:4]
internally y->ob_sval[3] would not be 0, it would be 'e', breaking the
API's rule about strings.
However! When a PyStringObject lives out its life purely within the
Python VM, the only code that strenuously examines its internals is
stringobject.c. And that code almost never needs the trailing zero*.
So I've added a new static method in stringobject.c:
char * PyString_AsUnterminatedString(PyStringObject *)
If you call it on a lazy-evaluation slice object, it gives you back a
pointer into the original string's ob_sval. The s->ob_size'th element
of this *might not* be zero, but if you call this function you're saying
that's a-okay, you promise not to look at it. (If the PyStringObject *
is any other variety, it calls into PyString_AsString, which renders
string concatenation objects then returns ob_sval.)
Again: this behavior is *never* observed by anyone outside of
stringobject.c. External users of PyStringObjects call
PyString_AS_STRING(), which renders all lazy concatenation and lazy
slices so they look just like normal zero-terminated PyStringObjects.
With my patch applied, trunk still passes all expected tests.
Of course, lazy slice objects aren't just for literal slices created
with [x:y]. There are lots of string methods that return what are
effectively string slices, like lstrip() and split().
With this code in place, string slices that aren't examined by modules
are very rarely rendered. I ran "pybench -n 2" (two rounds, warp 10
(whatever that means)) while collecting some statistics. When it
finished, the interpreter had created a total of 640,041 lazy slices, of
which only *19* were ever rendered.
Apart from lazy slices, there's only one more enhancement when compared
with v1: string prepending now reuses lazy concatenation objects much
more often. There was an optimization in string_concatenate
(Python/ceval.c) that said: "if the left-side string has two references,
and we're about to overwrite the second reference by storing this
concatenation to an object, tell that object to drop its reference".
That often meant the reference on the string dropped to 1, which meant
PyString_Resize could just resize the left-side string in place and
append the right-side. I modified it so it drops the reference to the
right-hand operand too. With this change, even with a reduction in the
allowable stack depth for right-hand recursion (so it's less likely to
blow the stack), I was able to prepend over 86k strings before it forced
a render. (Oh, for the record: I ensure depth limits are enforced when
combining lazy slices and lazy concatenations, so you still won't blow
your stack when you mix them together.)
Here are the highlights of a single apples-to-apples pybench run, 2.6
trunk revision 52413 ("this") versus that same revision with my patch
applied ("other"):
Test minimum run-time average run-time
this other diff this other
diff
-------------------------------------------------------------------------------
ConcatStrings: 204ms 76ms +168.4% 213ms 77ms
+177.7%
CreateStringsWithConcat: 159ms 138ms +15.7% 163ms 142ms
+15.1%
StringSlicing: 142ms 86ms +65.5% 145ms 88ms
+64.6%
-------------------------------------------------------------------------------
Totals: 7976ms 7713ms +3.4% 8257ms
7975ms +3.5%
I also ran this totally unfair benchmark:
x = "abcde" * (20000) # 100k characters
for i in xrange(10000000):
y = x[1:-1]
and found my patched version to be 9759% faster. (You heard that right,
98x faster.)
I'm ready to post the patch. However, as a result of this work, the
description on the original patch page is really no longer accurate:
http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470
Shall I close/delete that patch and submit a new patch with a more
modern description? After all, there's not a lot of activity on the old
patch page...
Cheers,
/larry/
* As I recall, stringobject.c needs the trailing zero in exactly *one*
place: when comparing two zero-length strings. My patch ensures that
zero-length slices and concatenations still return nullstring, so this
still works as expected.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061020/0a891b49/attachment.html
More information about the Python-Dev
mailing list