RE: [Python-Dev] Optimized string concatenation
data:image/s3,"s3://crabby-images/997b6/997b68b777824fb7470a8c7d66fd8cb6167d1f43" alt=""
PLEASE IGNORE PREVIOUS MESSAGE. (I really ought to stop using a mailer which is one accidental click away from sending an unfinished email) Armin writes:
The SF patch http://www.python.org/sf/980695 about making repeated string concatenations efficient has been reviewed and is acceptable on technical grounds. [...] This leaves open the policy questions:
* first, is that an implementation detail or a published feature?
IMHO, an implementation detail. Specifically, a published feature of CPython, but an implementation detail of the language Python.
* if it is a published feature, what about Jython?
And I guess we should ask about Iron Python too. The answer is that they add this optimization if and only if the maintainers of those versions get around to it. Whether it is easily done given the underlying platform is another important consideration. But if we leave it OUT of the language spec then we avoid unnecessarily breaking other implementations. That being said, I'll note that both Jython and Iron Python already use unicode as the basic string type, which is a much bigger change (except to those who manage to use only 7-bit ASCII).
* The patch would encourage a coding style that gives program that essentially don't scale with Jython -- nor, for that matter, with 2.3 or older
Yes, but we ALREADY see lots of programs that use that coding style, even though every web page talking about Python optimization lists that as the #1 issue. Whether we like it or not, it seems that especially for novice programmers, the "s = s + x" idiom for accumulating a string is the "obvious" way to do it. Encouraging that may not be good, but going along rather than fighting it seems like a wise idea.
* discussed on SF too is whether we should remove the 'a=a+b' acceleration from the patch, keeping only 'a+=b'; see the SF tracker.
Hmm... I couldn't think of any reason to limit the optimization to += until I actually went and read the comments in the SF tracker. What I took away from that discussion was that it's possible to optimize "a=a+b", but NOT possible to optimize "a=a+b+c". This is a subtle distinction that is harder to explain to people than simply saying "it only works with +=, not with +". That's a fairly convincing point, so I guess I'm on the fence on this one.
This seems overkill, but should the acceleration be there but disabled by default?
from __future__ import string_concatenate?
Absolutely, unconditionally NOT. I'd rather just leave it out. -- Michael Chermside
data:image/s3,"s3://crabby-images/d501e/d501ebac8695a6a0ff0a13f99601c648d910a813" alt=""
* discussed on SF too is whether we should remove the 'a=a+b' acceleration from the patch, keeping only 'a+=b'; see the SF tracker.
Hmm... I couldn't think of any reason to limit the optimization to += until I actually went and read the comments in the SF tracker. What I took away from that discussion was that it's possible to optimize "a=a+b", but NOT possible to optimize "a=a+b+c". This is a subtle distinction that is harder to explain to people than simply saying "it only works with +=, not with +".
That's a fairly convincing point, so I guess I'm on the fence on this one.
I'm not. Skipping a=a+b breaks symmetry with a+=b. More importantly, skipping a=a+b misses most of the potential benefits (see sre_parse.py for an example). PyBench, ParrotBench, and my other benchmarks all show gains when a=a+b is done inplace. The explanation is not hard. The CPython implementation can concatenate inplace two term expressions of the form a=a+b or a+=b. Expressions with more terms are not eligible for inplace concatenation. Raymond
data:image/s3,"s3://crabby-images/768ad/768adf4b77332cec18365db65c441160e753d8af" alt=""
Hello Raymond, On Mon, Aug 02, 2004 at 10:24:03PM -0400, Raymond Hettinger wrote:
The explanation is not hard. The CPython implementation can concatenate inplace two term expressions of the form a=a+b or a+=b. Expressions with more terms are not eligible for inplace concatenation.
No: a+=b+c is eligible. That's my problem. Armin
participants (3)
-
Armin Rigo
-
Michael Chermside
-
Raymond Hettinger