[Python-Dev] Optimized string concatenation

Michael Chermside mcherm at mcherm.com
Tue Aug 3 14:30:04 CEST 2004


Armin writes:
> The SF patch http://www.python.org/sf/980695 about making
> repeated string
> concatenations efficient has been reviewed and is acceptable
> on technical grounds.
    [...]
> This leaves open the policy questions:
>
> * first, is that an implementation detail or a published feature?

IMHO, an implementation detail. Specifically, a published feature
of CPython, but an implementation detail of the language Python.

> * if it is a published feature, what about Jython?

And I guess we should ask about Iron Python too. The answer is
that they add this optimization if and only if the maintainers of
those versions get around to it. Whether it is easily done given
the underlying platform is another important consideration. But
if we leave it OUT of the language spec then we avoid unnecessarily
breaking other implementations.

That being said, I'll note that both Jython and Iron Python already
use unicode as the basic string type, which is a much bigger
change (except to those who manage to use only 7-bit ASCII).

> * The patch would encourage a coding style that gives program
> that essentially
>   don't scale with Jython -- nor, for that matter, with 2.3
> or older

Yes, but we ALREADY see lots of programs that use that coding
style, even though every web page talking about Python optimization
lists that as the #1 issue. Whether we like it or not, it seems
that especially for novice programmers, the "s = s + x" idiom for
accumulating a string is the "obvious" way to do it. Encouraging
that may not be good, but going along rather than fighting it
seems like a wise idea.

> * discussed on SF too is whether we should remove the 'a=a+b'
> acceleration
>   from the patch, keeping only 'a+=b'; see the SF tracker.

Hmm... I couldn't think of any reason to limit the optimization
to += until I actually went and read the comments in the SF
tracker. What I took away from that discussion was that it's
possible to optimize "a=a+b", but NOT possible to optimize
"a=a+b+c". This is a subtle distinction that is harder to
explain to people than simply saying "it only works with +=, not
with +".



More information about the Python-Dev mailing list