string concatenation optimizations [from python-dev Summary]

Phil Frost indigo at bitglue.com
Tue Aug 24 18:53:35 EDT 2004


Has adding a stringish object that supports efficient slicing,
concatenation, and mutation been considered? The C++ STL rope comes to
mind. Essentially what I have in mind is a type that's a list of byte
arrays. The value is defined as the concatenation of these arrays.

This would allow efficient implementations of things such as
s[31:35] = 'replacing a small substring with a larger one'

I think a reasonable implementation could be done using existing python
types, and if it's useful, an opmitized C implementation could be done.

This sort of thing is already on my stack of things to find on google,
write, or get someone else to write, just need the time. So what do you
think? Useful idea? Does this already exist?

On Mon, Aug 23, 2004 at 09:50:48PM -0700, Brett Cannon wrote:
> python-dev Summary for 2004-08-01 through 2004-08-15
>
> [snip]
>
> -------------------------------------------------------------------------------------
> Changing the Big-O complexity for something in the language is now a 
> language feature
> -------------------------------------------------------------------------------------
> language evolution
> 
> Armin Rigo came up with a way to have string concatenation in a loop 
> (think ``for thing in iter_of_strings: concat_str += thing``) not be a 
> quadratic algorithm thanks to some trickery for ``a = a + b`` and ``a += 
> b`` conditions for strings.  The hope was to remove the (commonly 
> considered) wart of having ``"".join(iter_of_strings)`` be the suggested 
> way to concatenate a bunch of strings.
> 
> But Guido didn't like the patch.  His reasoning was that changing 
> something that led to a change in the Big-O complexity of certain 
> algorithms would inherently hurt other implementations of Python when 
> people would start to code specifically for that performance gain.  For 
> instance, having Jython be able to pull this trick off is, I believe, 
> near impossible.  So, in order to make sure changes like this are 
> considered before applying them, Guido instated a new rule that 
> "implementation features that affect not just the running speed but the 
> O() rate for certain algorithms should be considered language features, 
> because any implementation will be required to implement them in order 
> to ensure code portability" between implementations of Python.
> 
> In the end, though, this went in with a warning that the speed 
> performance is not portable.  It is not to be used in the stdlib ever.
> 
> Contributing threads:
>   - `Optimized string concatenation 
> <http://mail.python.org/pipermail/python-dev/2004-August/046686.html>`__
>   - `PEP 0008 confusion - here it is, but don't use it? 
> <http://mail.python.org/pipermail/python-dev/2004-August/047194.html>`__
>
> [snip]



More information about the Python-list mailing list