[Patches] [ python-Patches-1569040 ] Speed up using + for string concatenation

SourceForge.net noreply at sourceforge.net
Mon Oct 2 06:04:17 CEST 2006


Patches item #1569040, was opened at 2006-10-02 04:04
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1569040&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: Speed up using + for string concatenation

Initial Comment:
The core concept: adding two strings together no longer returns a pure
"string" object.  Instead, it returns a "string concatenation" object
which holds references to the two strings but does not actually
concatenate them... yet.  The strings are concatenated only when someone
requests the string's value, at which point it allocates all the space
it needs and renders the concatenated string all at once.

More to the point, if you add multiple strings together (a + b + c),
it *doesn't* compute the intermediate strings (a + b).

Upsides to this approach:
        * String concatenation using + is now the fastest way to
          concatenate strings (that I know of).

        * In particular, prepending is *way* faster than it used to be.
          It used to be a pathological case, n! or something.  Now it's
          linear.

        * Throw off the shackles of "".join([]), you don't need it
          anymore.

        * Did I mention it was faster?

Downsides to this approach:

        * Changes how PyStringObjects are stored internally; ob_sval is
          no longer a char[1], but a char *.  This makes each StringObject
          four bytes larger.

        * Adds another memory dereference in order to get the value of
          a string, which is a teensy-weensy slowdown.

        * Would force a recompile of all C modules that deal directly
          with string objects (which I imagine is most of them).

        * Also, *requires* that C modules use the PyString_AS_STRING()
          macro, rather than casting the object and grabbing ob_sval
          directly.  (I was pleased to see that the Python source
          was very good about using this macro; if all Python C
          modules are this well-behaved, this point is happily moot.)

        * On a related note, the file Mac/Modules/MacOS.c implies
          that there are Mac-specific Python scripts that peer
          directly into string objects.  These would have to be
          changed to understand the new semantics.

        * String concatenation objects are 36 bytes larger than
          string objects, and this space will often go unreclaimed
          after the string is rendered.

        * When rendered, string concatenation objects storing long
          strings will allocate a second buffer from the heap to
          store the string.  So this adds some minor allocation
          overhead (though this is offset by the speed gain from
          the approach overall).

        * Will definitely need some heavy review before it could
          go in, in particular I worry I got the semantics surrounding
          "interned" strings wrong.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1569040&group_id=5470


More information about the Patches mailing list