[Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

Ron Adam rrr at ronadam.com
Fri Oct 6 13:37:09 CEST 2006


Gregory P. Smith wrote:
>> I've never liked the "".join([]) idiom for string concatenation; in my 
>> opinion it violates the principles "Beautiful is better than ugly." and 
>> "There should be one-- and preferably only one --obvious way to do it.". 
>> (And perhaps several others.)  To that end I've submitted patch #1569040 
>> to SourceForge:
>>     
>> http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470
>> This patch speeds up using + for string concatenation.
> 
> yay!  i'm glad to see this.  i hate the "".join syntax.  i still write
> that as string.join() because thats at least readable).  it also fixes
> the python idiom for fast string concatenation as intended; anyone
> whos ever written code that builds a large string value by pushing
> substrings into a list only to call join later should agree.

Well I always like things to run faster, but I disagree that this idiom is broken.

I like using lists to store sub strings and I think it's just a matter of 
changing your frame of reference in how you think about them.  For example it 
doesn't bother me to have an numeric type with many digits, and to have lists of 
many, many digit numbers, and work with those.  Working with lists of many 
character strings is not that different.  I've even come to the conclusion (just 
my opinion) that mutable lists of strings probably would work better than a long 
mutable string of characters in most situations.

What I've found is there seems to be an optimum string length depending on what 
you are doing.  Too long (hundreds or thousands of characters) and repeating 
some string operations (not just concatenations) can be slow (relative to short 
strings), and using many short (single character) strings would use more memory 
than is needed.  So a list of medium length strings is actually a very nice 
compromise.  I'm not sure what the optimal strings length is, but lines of about 
80 columns seems to work very well for most things.

I think what may be missing is a larger set of higher level string functions 
that will work with lists of strings directly.  Then lists of strings can be 
thought of as a mutable string type by its use, and then working with substrings 
in lists and using ''.join() will not seem as out of place.  So maybe instead of 
splitting, modifying, then joining, (and again, etc ...), just pass the whole 
list around and have operations that work directly on the list of strings and 
return a list of strings as the result.  Pretty much what the Patch does under 
the covers, but it only works with concatenation.  Having more functions that 
work with lists of strings directly will reduce the need for concatenation as well.

Some operations that could work well with whole lists of strings of lines may be 
indent_lines, dedent_lines, prepend_lines, wrap_lines, and of course join_lines 
as in '\n'.join(L), the inverse of s.splitlines(), and there also readlines() 
and writelines(). Also possilby find_line or find_in_lines(). These really 
shouldn't seem anymore out of place than numeric operations that work with lists 
such as sum, max, and min.  So to me...  "".join(L) as a string operation that 
works on a list of strings seems perfectly natural. :-)

Cheers,
    Ron




More information about the Python-Dev mailing list