[Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Ron Adam
rrr at ronadam.com
Fri Oct 6 13:37:09 CEST 2006
Gregory P. Smith wrote:
>> I've never liked the "".join([]) idiom for string concatenation; in my
>> opinion it violates the principles "Beautiful is better than ugly." and
>> "There should be one-- and preferably only one --obvious way to do it.".
>> (And perhaps several others.) To that end I've submitted patch #1569040
>> to SourceForge:
>>
>> http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470
>> This patch speeds up using + for string concatenation.
>
> yay! i'm glad to see this. i hate the "".join syntax. i still write
> that as string.join() because thats at least readable). it also fixes
> the python idiom for fast string concatenation as intended; anyone
> whos ever written code that builds a large string value by pushing
> substrings into a list only to call join later should agree.
Well I always like things to run faster, but I disagree that this idiom is broken.
I like using lists to store sub strings and I think it's just a matter of
changing your frame of reference in how you think about them. For example it
doesn't bother me to have an numeric type with many digits, and to have lists of
many, many digit numbers, and work with those. Working with lists of many
character strings is not that different. I've even come to the conclusion (just
my opinion) that mutable lists of strings probably would work better than a long
mutable string of characters in most situations.
What I've found is there seems to be an optimum string length depending on what
you are doing. Too long (hundreds or thousands of characters) and repeating
some string operations (not just concatenations) can be slow (relative to short
strings), and using many short (single character) strings would use more memory
than is needed. So a list of medium length strings is actually a very nice
compromise. I'm not sure what the optimal strings length is, but lines of about
80 columns seems to work very well for most things.
I think what may be missing is a larger set of higher level string functions
that will work with lists of strings directly. Then lists of strings can be
thought of as a mutable string type by its use, and then working with substrings
in lists and using ''.join() will not seem as out of place. So maybe instead of
splitting, modifying, then joining, (and again, etc ...), just pass the whole
list around and have operations that work directly on the list of strings and
return a list of strings as the result. Pretty much what the Patch does under
the covers, but it only works with concatenation. Having more functions that
work with lists of strings directly will reduce the need for concatenation as well.
Some operations that could work well with whole lists of strings of lines may be
indent_lines, dedent_lines, prepend_lines, wrap_lines, and of course join_lines
as in '\n'.join(L), the inverse of s.splitlines(), and there also readlines()
and writelines(). Also possilby find_line or find_in_lines(). These really
shouldn't seem anymore out of place than numeric operations that work with lists
such as sum, max, and min. So to me... "".join(L) as a string operation that
works on a list of strings seems perfectly natural. :-)
Cheers,
Ron
More information about the Python-Dev
mailing list