[Patches] [ python-Patches-1569040 ] Speed up using + for string concatenation
noreply at sourceforge.net
Mon Oct 2 06:04:17 CEST 2006
Patches item #1569040, was opened at 2006-10-02 04:04
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: Speed up using + for string concatenation
The core concept: adding two strings together no longer returns a pure
"string" object. Instead, it returns a "string concatenation" object
which holds references to the two strings but does not actually
concatenate them... yet. The strings are concatenated only when someone
requests the string's value, at which point it allocates all the space
it needs and renders the concatenated string all at once.
More to the point, if you add multiple strings together (a + b + c),
it *doesn't* compute the intermediate strings (a + b).
Upsides to this approach:
* String concatenation using + is now the fastest way to
concatenate strings (that I know of).
* In particular, prepending is *way* faster than it used to be.
It used to be a pathological case, n! or something. Now it's
* Throw off the shackles of "".join(), you don't need it
* Did I mention it was faster?
Downsides to this approach:
* Changes how PyStringObjects are stored internally; ob_sval is
no longer a char, but a char *. This makes each StringObject
four bytes larger.
* Adds another memory dereference in order to get the value of
a string, which is a teensy-weensy slowdown.
* Would force a recompile of all C modules that deal directly
with string objects (which I imagine is most of them).
* Also, *requires* that C modules use the PyString_AS_STRING()
macro, rather than casting the object and grabbing ob_sval
directly. (I was pleased to see that the Python source
was very good about using this macro; if all Python C
modules are this well-behaved, this point is happily moot.)
* On a related note, the file Mac/Modules/MacOS.c implies
that there are Mac-specific Python scripts that peer
directly into string objects. These would have to be
changed to understand the new semantics.
* String concatenation objects are 36 bytes larger than
string objects, and this space will often go unreclaimed
after the string is rendered.
* When rendered, string concatenation objects storing long
strings will allocate a second buffer from the heap to
store the string. So this adds some minor allocation
overhead (though this is offset by the speed gain from
the approach overall).
* Will definitely need some heavy review before it could
go in, in particular I worry I got the semantics surrounding
"interned" strings wrong.
You can respond by visiting:
More information about the Patches