Gregory P. Smith wrote:
I've never liked the "".join() idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". (And perhaps several others.) To that end I've submitted patch #1569040 to SourceForge: http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&gro... This patch speeds up using + for string concatenation.
yay! i'm glad to see this. i hate the "".join syntax. i still write that as string.join() because thats at least readable). it also fixes the python idiom for fast string concatenation as intended; anyone whos ever written code that builds a large string value by pushing substrings into a list only to call join later should agree.
Well I always like things to run faster, but I disagree that this idiom is broken.
I like using lists to store sub strings and I think it's just a matter of changing your frame of reference in how you think about them. For example it doesn't bother me to have an numeric type with many digits, and to have lists of many, many digit numbers, and work with those. Working with lists of many character strings is not that different. I've even come to the conclusion (just my opinion) that mutable lists of strings probably would work better than a long mutable string of characters in most situations.
What I've found is there seems to be an optimum string length depending on what you are doing. Too long (hundreds or thousands of characters) and repeating some string operations (not just concatenations) can be slow (relative to short strings), and using many short (single character) strings would use more memory than is needed. So a list of medium length strings is actually a very nice compromise. I'm not sure what the optimal strings length is, but lines of about 80 columns seems to work very well for most things.
I think what may be missing is a larger set of higher level string functions that will work with lists of strings directly. Then lists of strings can be thought of as a mutable string type by its use, and then working with substrings in lists and using ''.join() will not seem as out of place. So maybe instead of splitting, modifying, then joining, (and again, etc ...), just pass the whole list around and have operations that work directly on the list of strings and return a list of strings as the result. Pretty much what the Patch does under the covers, but it only works with concatenation. Having more functions that work with lists of strings directly will reduce the need for concatenation as well.
Some operations that could work well with whole lists of strings of lines may be indent_lines, dedent_lines, prepend_lines, wrap_lines, and of course join_lines as in '\n'.join(L), the inverse of s.splitlines(), and there also readlines() and writelines(). Also possilby find_line or find_in_lines(). These really shouldn't seem anymore out of place than numeric operations that work with lists such as sum, max, and min. So to me... "".join(L) as a string operation that works on a list of strings seems perfectly natural. :-)