[Patches] [ python-Patches-980695 ] efficient string concatenation

Tue Aug 3 16:13:18 CEST 2004

Patches item #980695, was opened at 2004-06-27 09:39
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=980695&group_id=5470

Category: Core (C code)
Group: None
Status: Open
>Resolution: None
Priority: 5
Submitted By: Armin Rigo (arigo)
Assigned to: Armin Rigo (arigo)
Summary: efficient string concatenation

Initial Comment:
A wild idea that makes repeated string concatenations efficient without changing stringobject.c.

If we assume we don't want to change the representation of strings, then the problem is that string_concat(v,w) doesn't know if v will soon released, so it cannot resize it in-place even if the refcnt is 1.

But with some hacks ceval.c can know that.  Statements like s=s+expr or s+=expr both compile to a BINARY_ADD or INPLACE_ADD followed by a STORE_FAST or STORE_NAME.  So in the attached patch ceval.c special-cases addition of two strings (in the same way as it special-cases addition of two integers already).  If moreover the addition is followed by a STORE that is about to overwrite the addition's left argument, and if the refcnt is right, then the left argument can be resized in-place (plus some obscure magic to ensure that everything is still valid even if resize moves the string in memory).

With Python's good memory manager, repeated resizes even without manual over-allocation perform nicely.

As a side effect, other constructions like a+b+c+d+e+f also work in-place now.

The patch would do with a lot more comments.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2004-08-03 10:13

Message:
Logged In: YES 
user_id=6380

Not so fast, see python-dev.

----------------------------------------------------------------------

Comment By: Michael Chermside (mcherm)
Date: 2004-08-03 08:37

Message:
Logged In: YES 
user_id=99874

While I think CPython should get the patch, I *am* worried
about Jython (and Iron Python I suppose). In particular,
Samuele's plea to keep this out of the std library is likely
to be followed for a few months, then completely forgotten.
Perhaps we need some way to check, when running the test
suite, just how much dependence there is on this feature...
then there's at least be a way to *find* the bits where
people who didn't predate this change failed to use the
obscure ''.join() idiom that us oldtimers keep nagging about
<wink>.

----------------------------------------------------------------------

Comment By: Samuele Pedroni (pedronis)
Date: 2004-08-03 07:19

Message:
Logged In: YES 
user_id=61408

well, given the refcnt dependence and other issues, this cannot
be reproduced in Jython (and similar impl. of Python).

the issue is then that the std library itself cannot depend
on the optimisation because the std library is shared among
impls.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-08-02 11:09

Message:
Logged In: YES 
user_id=80475

I recommend doing both.  When and where the optimization
applies is an implementation detail (that probably won't be
replicated in Jython for example).

To do only inplace add is to miss much of the existing code
that is in need of help.  Several library modules and
benchmarks benefit greatly from doing both.  This makes
common coding styles not result in disasterous performance.  

The explanation can read something like:
"With binary additions of the form a+=b or a=a+b, CPython is
able to concatenate inplace and achieve linear running time.
 However, with chained additions like a = a + b + c, the
current implementation does not recognize the opportunity to
concatenate inplace.  Instead, a new string is constructed
at each step resulting in quadratic running time."

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-08-02 07:10

Message:
Logged In: YES 
user_id=4771

Optimizing BINARY_ADD was what I did in the first patch too, but it has the drawback of being harder to explain.  The main trouble I see is that a=a+b+c will NOT be accelerated, while a=a+b or a+=b+c will.

The rationale for only tampering with INPLACE_ADD was to make the result more predictable.  Moreover the behavior would be superficially similar to what occurs with lists with a=a+b vs a+=b.

In-place operators in Python are a dark corner of the language.  I think it is OK for people to think about it as a possibly optimized (in-place-modifying) version of the standard operators.  Thus the idea of only tampering with INPLACE_ADD for the optimization here.

Well, maybe you can find a convincingly clear explanation of why a=a+b works but a=a+b+c doesn't.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-08-01 14:42

Message:
Logged In: YES 
user_id=80475

Further timings show it is advantageous to keep BINARY_ADD
and INPLACE_ADD separate.  Revised patch attached.  Now all
benchmark tests show improvements.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-08-01 12:34

Message:
Logged In: YES 
user_id=80475

FWIW, I think the patch ought to also apply to BINARY_ADD. 
There is tons of code that writes a=a+b rather than a+=b.  
It's important that there not be a substantial performance
difference between the two; otherwise, we're introducing
another bit of arcana.

Attached is a revised patch that combines BINARY_ADD with
INPLACE_ADD for a net savings of instructions in ceval.c. 
The timings improve on all my benchmarks (PyBench for
example runs 4% faster).

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-08-01 11:29

Message:
Logged In: YES 
user_id=80475

I've run with this for a couple of days and have not been
able to break it.   My benchmarks run slightly faster except
for PyBench which is 0.1% slower.  Regression tests pass
without exception.

The code itself looks fine and the approach makes sense.  It
does introduce some extra complexity and code duplication,
but the patch is still unified and comprehensible.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-07-25 06:18

Message:
Logged In: YES 
user_id=4771

Raymond, do you have time to review it?

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-07-25 06:17

Message:
Logged In: YES 
user_id=4771

Here is another patch.  This one focuses on simplicity, both
implementation-wise and from the user's point of view:

1) It only makes repeated  variable += expr  faster. It
doesn't touch the '+'.
2) It doesn't mess with the internals of strings and dicts
any more.  It is just one well-documented function now.

The goal of this new patch is to be reviewable and
maintainable, to get it in the core to stop people from
being bitten by the performance of += (I just saw yet
another example yesterday).

----------------------------------------------------------------------

Comment By: Michael Chermside (mcherm)
Date: 2004-06-28 07:38

Message:
Logged In: YES 
user_id=99874

Hmmm... Interesting. I kinda like it.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-06-28 05:41

Message:
Logged In: YES 
user_id=4771

another patch with support for STORE_DEREF (thanks Phillip for pointing it out)

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-06-27 16:39

Message:
Logged In: YES 
user_id=4771

resubmitted as a context diff.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=980695&group_id=5470