[Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
larry at hastings.org
Wed Oct 4 20:08:16 CEST 2006
I've never liked the "".join() idiom for string concatenation; in my
opinion it violates the principles "Beautiful is better than ugly." and
"There should be one-- and preferably only one --obvious way to do it.".
(And perhaps several others.) To that end I've submitted patch #1569040
This patch speeds up using + for string concatenation. It's been in
discussion on c.l.p for about a week, here:
I'm not a Python guru, and my initial benchmark had many mistakes. With
help from the community correct benchmarks emerged: + for string
concatenation is now roughly as fast as the usual "".join() idiom when
appending. (It appears to be *much* faster for prepending.) The
patched Python passes all the tests in regrtest.py for which I have
source; I didn't install external packages such as bsddb and sqlite3.
My approach was to add a "string concatenation" object; I have since
learned this is also called a "rope". Internally, a
PyStringConcatationObject is exactly like a PyStringObject but with a
few extra members taking an additional thirty-six bytes of storage.
When you add two PyStringObjects together, string_concat() returns a
PyStringConcatationObject which contains references to the two strings.
Concatenating any mixture of PyStringObjects and
PyStringConcatationObjects works similarly, though there are some
These changes are almost entirely contained within
Objects/stringobject.c and Include/stringobject.h. There is one major
externally-visible change in this patch: PyStringObject.ob_sval is no
longer a char array, but a char *. Happily, this only requires a
recompile, because the CPython source is *marvelously* consistent about
using the macro PyString_AS_STRING(). (One hopes extension authors are
as consistent.) I only had to touch two other files (Python/ceval.c and
Objects/codeobject.c) and those were one-line changes. There is one
remaining place that still needs fixing: the self-described "hack" in
Mac/Modules/MacOS.c. Fixing that is beyond my pay grade.
I changed the representation of ob_sval for two reasons: first, it is
initially NULL for a string concatenation object, and second, because it
may point to separately-allocated memory. That's where the speedup came
from--it doesn't render the string until someone asks for the string's
value. It is telling to see my new implementation of
PyString_AS_STRING, as follows (casts and extra parentheses removed for
#define PyString_AS_STRING(x) ( x->ob_sval ? x->ob_sval :
This adds a layer of indirection for the string and a branch, adding a
tiny (but measurable) slowdown to the general case. Again, because the
changes to PyStringObject are hidden by this macro, external users of
these objects don't notice the difference.
The patch is posted, and I have donned the thickest skin I have handy.
I look forward to your feedback.
More information about the Python-Dev