Create a StringBuilder class and use it everywhere

Hi! There's a certain problem right now in python that when people need to build string from pieces they really often do something like this:: def main_pure(): b = u"initial value" for i in xrange(30000): b += u"more data" return b The bad thing about it is that new string is created every time you do +=, so it performs bad on CPython (and horrible on PyPy). If people would use, for example, list of strings it would be much better (performance):: def main_list_append(): b = [u"initial value"] for i in xrange(3000000): b.append(u"more data") return u"".join(b) The results are:: kost@kost-laptop:~/tmp$ time python string_bucket_pure.py real 0m7.194s user 0m3.590s sys 0m3.580s kost@kost-laptop:~/tmp$ time python string_bucket_append.py real 0m0.417s user 0m0.330s sys 0m0.080s Fantastic, isn't it? Also, now let's forget about speed and think about semantics a little: your task is: "build a string from it's pieces", or in other words "build a string from list of pieces", so from this point of view you can say that using [] and u"".join is better in semantic way. Java has it's StringBuilder class for a long time (I'm not really into java, I've just been told about that), and what I think is that python should have it's own StringBuilder:: class StringBuilder(object): """Use it instead of doing += for building unicode strings from pieces""" def __init__(self, val=u""): self.val = val self.appended = [] def __iadd__(self, other): self.appended.append(other) return self def __unicode__(self): self.val = u"".join((self.val, u"".join(self.appended))) self.appended = [] return self.val Why StringBuilder class, not just use [] + u''.join ? Well, I have two reasons for that: 1. It has caching 2. You can document it, because when programmer looks at [] + u"" method he doesn't see _WHY_ is it done so, while when he sees StringBuilder class he can go ahead and read it's help(). Performance of StringBuilder is ok compared to [] + u"" (I've increased number of += from 30000 to 30000000): def main_bucket(): b = StringBuilder(u"initial value ") for i in xrange(30000000): b += u"more data" return unicode(b) For CPython:: kost@kost-laptop:~/tmp$ time python string_bucket_bucket.py real 0m12.944s user 0m11.670s sys 0m1.260s kost@kost-laptop:~/tmp$ time python string_bucket_append.py real 0m3.540s user 0m2.830s sys 0m0.690s For PyPy 1.6:: (pypy)kost@kost-laptop:~/tmp$ time python string_bucket_bucket.py real 0m18.593s user 0m12.930s sys 0m5.600s (pypy)kost@kost-laptop:~/tmp$ time python string_bucket_append.py real 0m16.214s user 0m11.750s sys 0m4.280s Of course, C implementation could be done to make things faster for CPython, I guess, but really, in comparision to += method it doesn't matter now. It's done to be explicit. p.s.: also, why not use cStringIO? 1. it's not semantically right to create file-like string just to join multiple string pieces into one. 2. if you talk about using it in your code right away -- you can see that noone still uses it because people want += (while with StringBuilder you give them +=). 3. it's somehow slow on pypy right now :-) Thanks.

k_bx wrote:
I think you should use cStringIO in your class implementation. The list + join idiom is nice, but it has the disadvantage of creating and keeping alive many small string objects (with all the memory overhead and fragmentation that goes along with it). AFAIR, the most efficient approach is using arrays:
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 25 2011)
2011-10-04: PyCon DE 2011, Leipzig, Germany 40 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

25.08.2011, 12:45, "M.-A. Lemburg" <mal@egenix.com>:
I'm perfectly ok with different implementation of StringBuilder, but the main idea and proposal here is to make it in standard library somehow and force (and promote) uses of it everywhere, maybe write some FAQ. So that when you see some new += code all you need it so go and fix that without worrying about complains :-D

k_bx wrote:
I'm perfectly ok with different implementation of StringBuilder, but the main idea and proposal here is to make it in standard library somehow and force (and promote) uses of it everywhere, maybe write some FAQ. So that when you see some new += code all you need it so go and fix that without worrying about complains :-D
I guess adding something like this to string.py would be worthwhile exploring. It's a very common use case and the list-idiom doesn't read well in practice. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 25 2011)
2011-10-04: PyCon DE 2011, Leipzig, Germany 40 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 08/25/2011 03:19 AM, M.-A. Lemburg wrote:
I think the right place to do this is inside Python itself. I proposed something to do that several years ago, been meaning to revive it. http://bugs.python.org/issue1569040 /larry/

On Thu, Aug 25, 2011 at 11:45, M.-A. Lemburg <mal@egenix.com> wrote:
AFAIK using cStringIO just for string building is much slower than using list.append() + join(). IIRC we tested some micro-benchmarks on this for Mercurial output (where it was a significant part of the profile for some commands). That was on Python 2, of course, it may be better in io.StringIO and/or Python 3. Cheers, Dirkjan

Dirkjan Ochtman wrote:
Turns our you're right (list.append must have gotten a lot faster since I last tested this years ago, or I simply misremembered the results).
Here's the Python2 code: """ TIMEIT_N = 10 N = 1000000 SIZES = (2, 10, 23, 30, 33, 22, 15, 16, 27) N_STRINGS = len(SIZES) STRINGS = ['x' * SIZES[i] for i in range(N_STRINGS)] REFERENCE = ''.join(STRINGS[i % N_STRINGS] for i in xrange(N)) def cstringio(): import cStringIO s = cStringIO.StringIO() write = s.write for i in xrange(N): write(STRINGS[i % N_STRINGS]) result = s.getvalue() assert result == REFERENCE def array(): import array s = array.array('c') write = s.fromstring for i in xrange(N): write(STRINGS[i % N_STRINGS]) result = s.tostring() assert result == REFERENCE def listappend(): l = [] append = l.append for i in xrange(N): append(STRINGS[i % N_STRINGS]) result = ''.join(l) assert result == REFERENCE if __name__ == '__main__': import sys, timeit for test in sys.argv[1:]: print 'Running test %s ...' % test t = timeit.timeit('%s()' % test, 'from __main__ import %s' % test, number=TIMEIT_N) print ' %.2f ms' % (t / TIMEIT_N * 1e3) """ Aside: For some reason cStringIO and array got slower in Python 2.7. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 29 2011)
2011-10-04: PyCon DE 2011, Leipzig, Germany 36 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 2011-08-29, at 11:27 , M.-A. Lemburg wrote:
Converting your code straight to bytes (so array still works) yields this on Python 3.2.1: > python3.2 timetest.py io array listappend Running test io ... 334.03 ms Running test array ... 776.66 ms Running test listappend ... 314.90 ms For string (excluding array): > python3.2 timetest.py io listappend Running test io ... 451.45 ms Running test listappend ... 356.39 ms

Masklinn wrote:
Unicode works with the array module as well. Just use 'u' as array code and replace fromstring/tostring with fromunicode/tounicode. In any case, the array module approach appears to the be slowest of all three tests. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 29 2011)
2011-10-04: PyCon DE 2011, Leipzig, Germany 36 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Mon, 29 Aug 2011 11:27:23 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
The join() idiom only does one big copy at the end, while the StringIO/BytesIO idiom copies at every resize (unless the memory allocator is very smart). Both are O(N) but the join() version does less copies and (re)allocations. (there are also the list resizings but that object is much smaller) Regards Antoine.

29.08.11, 15:43, "Antoine Pitrou" <solipsis@pitrou.net>:
Ok, so I think the best approach would be to implement via join + [], but do flush every 1000 ops, since it can save memory. As for the whole idea -- I still think that creating something like this and adding to stdlib (with __iadd__ and . append() API, which makes refactoring need to be only one string, like doing StringBuilder(u"Foo")) and documenting that would be super-cool. So who says the last word on this?

Le lundi 29 août 2011 à 19:04 +0300, k.bx@ya.ru a écrit :
Ok, so I think the best approach would be to implement via join + [], but do flush every 1000 ops, since it can save memory.
That approach (or a similar one) could actually be integrated into StringIO and BytesIO. As long as you only write() at the end of the in-memory object, there's no need to actually concatenate. And it would be much easier (and less impacting on C extension code) to implement that approach in the StringIO and BytesIO objects, than in the bytes and str types as Larry did. Regards Antoine.

Interesting semantics… What version of Python were you using? The current documentation has this to say: • CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use thestr.join() method which assures consistent linear concatenation performance across versions and implementations. Changed in version 2.4: Formerly, string concatenation never occurred in-place. <http://docs.python.org/library/stdtypes.html> It's my understanding that the naïve approach should now have performance comparable to the "proper" list append technique as long as you use CPython >2.4. -- Carl Johnson

Carl Matthew Johnson wrote:
Relying on that is a bad idea. It is not portable from CPython to any other Python (none of IronPython, Jython or PyPy can include that optimization), it also depends on details of the memory manager used by your operating system (what is fast on one computer can be slow on another), and it doesn't even work under all circumstances (it relies on the string having exactly one reference as well as the exact form of the concatenation). Here's a real-world example of how the idiom of repeated string concatenation goes bad: http://www.mail-archive.com/pypy-dev@python.org/msg00682.html Here's another example, from a few years back, where part of the standard library using string concatenation was *extremely* slow under Windows. Linux users saw no slowdown and it was very hard to diagnose the problem: http://www.mail-archive.com/python-dev@python.org/msg40692.html -- Steven

Am 25.08.2011 12:38, schrieb k_bx:
Oh, and also, I really like how Python had it's MutableString class since forever, but deprecated in python 3.
You do realize that MutableString's __iadd__ just performs += on str operands? Georg

On 8/25/2011 6:38 AM, k_bx wrote:
Oh, and also, I really like how Python had it's MutableString class since forever, but deprecated in python 3.
(removed, i presume you mean...) and added bytearray. I have no idea if += on such is any better than O(n*n) -- Terry Jan Reedy

On Thu, 25 Aug 2011 12:28:14 +0300 k_bx <k.bx@ya.ru> wrote:
And Python has io.StringIO. I don't think we need to reinvent the wheel under another name. http://docs.python.org/library/io.html#io.StringIO By the way, when prototyping snippets for the purpose of demonstrating new features, you should really use Python 3, because Python 2 is in bugfix-only mode. (same applies to benchmark results, actually) Regards Antoine.

If the join idiom really bothers you... import io def build_str(iterable): # Essentially ''.join, just with str() coercion # and less memory fragmentation target = io.StringIO() for item in iterable: target.write(str(item)) return target.getvalue() # Caution: decorator abuse ahead # I'd prefer this to a StringBuilder class, though :) def gen_str(g): return build_str(g())
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Aug 25, 2011 at 5:28 AM, k_bx <k.bx@ya.ru> wrote:
This doesn't seem nicer to read and write to me than the list form. I also do not see any reason to believe it will stop people from doing it the quadratic way if the ubiquitous make-a-list-then-join idiom does not. Mike

Mike Graham wrote:
Agreed. Just because the Java idiom is StringBuilder doesn't mean Python should ape it. Python already has a "build strings efficiently" idiom: ''.join(iterable_of_strings) If people can't, or won't, learn this idiom, why would they learn to use StringBuilder instead? -- Steven

Steven D'Aprano, 25.08.2011 16:00:
Plus, StringBuilder is only a special case. Joining a string around other delimiters is straight forward once you've learned about ''.join(). Doing the same with StringBuilder is non-trivial (as the Java example nicely shows). Stefan

On 8/25/2011 5:28 AM, k_bx wrote:
I do not see the need to keep the initial piece separate and do the double join. For Py3 class StringBuilder(object): """Use it instead of doing += for building unicode strings from pieces""" def __init__(self, val=""): self.pieces = [val] def __iadd__(self, item): self.pieces.append(item) return self def __str__(self): val = "".join(self.pieces) self.pieces = [val] return val s = StringBuilder('a') s += 'b' s += 'c' print(s) s += 'd' print(s)
abc abcd
I am personally happy enough with [].append, but I can see the attraction of += if doing many separate lines rather than .append within a loop. -- Terry Jan Reedy

For the record, the "".join() idiom also has its downsides. If you build a list of many tiny strings, memory consumption can grow beyond the reasonable (in one case, building a 600MB JSON string outgrew the RAM of an 8GB machine). One solution is to regularly accumulate the primary list into a secondary accumulation list as done in http://hg.python.org/cpython/rev/47176e8d7060 Regards Antoine. On Thu, 25 Aug 2011 12:28:14 +0300 k_bx <k.bx@ya.ru> wrote:

k_bx wrote:
I think you should use cStringIO in your class implementation. The list + join idiom is nice, but it has the disadvantage of creating and keeping alive many small string objects (with all the memory overhead and fragmentation that goes along with it). AFAIR, the most efficient approach is using arrays:
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 25 2011)
2011-10-04: PyCon DE 2011, Leipzig, Germany 40 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

25.08.2011, 12:45, "M.-A. Lemburg" <mal@egenix.com>:
I'm perfectly ok with different implementation of StringBuilder, but the main idea and proposal here is to make it in standard library somehow and force (and promote) uses of it everywhere, maybe write some FAQ. So that when you see some new += code all you need it so go and fix that without worrying about complains :-D

k_bx wrote:
I'm perfectly ok with different implementation of StringBuilder, but the main idea and proposal here is to make it in standard library somehow and force (and promote) uses of it everywhere, maybe write some FAQ. So that when you see some new += code all you need it so go and fix that without worrying about complains :-D
I guess adding something like this to string.py would be worthwhile exploring. It's a very common use case and the list-idiom doesn't read well in practice. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 25 2011)
2011-10-04: PyCon DE 2011, Leipzig, Germany 40 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 08/25/2011 03:19 AM, M.-A. Lemburg wrote:
I think the right place to do this is inside Python itself. I proposed something to do that several years ago, been meaning to revive it. http://bugs.python.org/issue1569040 /larry/

On Thu, Aug 25, 2011 at 11:45, M.-A. Lemburg <mal@egenix.com> wrote:
AFAIK using cStringIO just for string building is much slower than using list.append() + join(). IIRC we tested some micro-benchmarks on this for Mercurial output (where it was a significant part of the profile for some commands). That was on Python 2, of course, it may be better in io.StringIO and/or Python 3. Cheers, Dirkjan

Dirkjan Ochtman wrote:
Turns our you're right (list.append must have gotten a lot faster since I last tested this years ago, or I simply misremembered the results).
Here's the Python2 code: """ TIMEIT_N = 10 N = 1000000 SIZES = (2, 10, 23, 30, 33, 22, 15, 16, 27) N_STRINGS = len(SIZES) STRINGS = ['x' * SIZES[i] for i in range(N_STRINGS)] REFERENCE = ''.join(STRINGS[i % N_STRINGS] for i in xrange(N)) def cstringio(): import cStringIO s = cStringIO.StringIO() write = s.write for i in xrange(N): write(STRINGS[i % N_STRINGS]) result = s.getvalue() assert result == REFERENCE def array(): import array s = array.array('c') write = s.fromstring for i in xrange(N): write(STRINGS[i % N_STRINGS]) result = s.tostring() assert result == REFERENCE def listappend(): l = [] append = l.append for i in xrange(N): append(STRINGS[i % N_STRINGS]) result = ''.join(l) assert result == REFERENCE if __name__ == '__main__': import sys, timeit for test in sys.argv[1:]: print 'Running test %s ...' % test t = timeit.timeit('%s()' % test, 'from __main__ import %s' % test, number=TIMEIT_N) print ' %.2f ms' % (t / TIMEIT_N * 1e3) """ Aside: For some reason cStringIO and array got slower in Python 2.7. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 29 2011)
2011-10-04: PyCon DE 2011, Leipzig, Germany 36 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 2011-08-29, at 11:27 , M.-A. Lemburg wrote:
Converting your code straight to bytes (so array still works) yields this on Python 3.2.1: > python3.2 timetest.py io array listappend Running test io ... 334.03 ms Running test array ... 776.66 ms Running test listappend ... 314.90 ms For string (excluding array): > python3.2 timetest.py io listappend Running test io ... 451.45 ms Running test listappend ... 356.39 ms

Masklinn wrote:
Unicode works with the array module as well. Just use 'u' as array code and replace fromstring/tostring with fromunicode/tounicode. In any case, the array module approach appears to the be slowest of all three tests. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 29 2011)
2011-10-04: PyCon DE 2011, Leipzig, Germany 36 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Mon, 29 Aug 2011 11:27:23 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
The join() idiom only does one big copy at the end, while the StringIO/BytesIO idiom copies at every resize (unless the memory allocator is very smart). Both are O(N) but the join() version does less copies and (re)allocations. (there are also the list resizings but that object is much smaller) Regards Antoine.

29.08.11, 15:43, "Antoine Pitrou" <solipsis@pitrou.net>:
Ok, so I think the best approach would be to implement via join + [], but do flush every 1000 ops, since it can save memory. As for the whole idea -- I still think that creating something like this and adding to stdlib (with __iadd__ and . append() API, which makes refactoring need to be only one string, like doing StringBuilder(u"Foo")) and documenting that would be super-cool. So who says the last word on this?

Le lundi 29 août 2011 à 19:04 +0300, k.bx@ya.ru a écrit :
Ok, so I think the best approach would be to implement via join + [], but do flush every 1000 ops, since it can save memory.
That approach (or a similar one) could actually be integrated into StringIO and BytesIO. As long as you only write() at the end of the in-memory object, there's no need to actually concatenate. And it would be much easier (and less impacting on C extension code) to implement that approach in the StringIO and BytesIO objects, than in the bytes and str types as Larry did. Regards Antoine.

Interesting semantics… What version of Python were you using? The current documentation has this to say: • CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use thestr.join() method which assures consistent linear concatenation performance across versions and implementations. Changed in version 2.4: Formerly, string concatenation never occurred in-place. <http://docs.python.org/library/stdtypes.html> It's my understanding that the naïve approach should now have performance comparable to the "proper" list append technique as long as you use CPython >2.4. -- Carl Johnson

Carl Matthew Johnson wrote:
Relying on that is a bad idea. It is not portable from CPython to any other Python (none of IronPython, Jython or PyPy can include that optimization), it also depends on details of the memory manager used by your operating system (what is fast on one computer can be slow on another), and it doesn't even work under all circumstances (it relies on the string having exactly one reference as well as the exact form of the concatenation). Here's a real-world example of how the idiom of repeated string concatenation goes bad: http://www.mail-archive.com/pypy-dev@python.org/msg00682.html Here's another example, from a few years back, where part of the standard library using string concatenation was *extremely* slow under Windows. Linux users saw no slowdown and it was very hard to diagnose the problem: http://www.mail-archive.com/python-dev@python.org/msg40692.html -- Steven

Am 25.08.2011 12:38, schrieb k_bx:
Oh, and also, I really like how Python had it's MutableString class since forever, but deprecated in python 3.
You do realize that MutableString's __iadd__ just performs += on str operands? Georg

On 8/25/2011 6:38 AM, k_bx wrote:
Oh, and also, I really like how Python had it's MutableString class since forever, but deprecated in python 3.
(removed, i presume you mean...) and added bytearray. I have no idea if += on such is any better than O(n*n) -- Terry Jan Reedy

On Thu, 25 Aug 2011 12:28:14 +0300 k_bx <k.bx@ya.ru> wrote:
And Python has io.StringIO. I don't think we need to reinvent the wheel under another name. http://docs.python.org/library/io.html#io.StringIO By the way, when prototyping snippets for the purpose of demonstrating new features, you should really use Python 3, because Python 2 is in bugfix-only mode. (same applies to benchmark results, actually) Regards Antoine.

If the join idiom really bothers you... import io def build_str(iterable): # Essentially ''.join, just with str() coercion # and less memory fragmentation target = io.StringIO() for item in iterable: target.write(str(item)) return target.getvalue() # Caution: decorator abuse ahead # I'd prefer this to a StringBuilder class, though :) def gen_str(g): return build_str(g())
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Aug 25, 2011 at 5:28 AM, k_bx <k.bx@ya.ru> wrote:
This doesn't seem nicer to read and write to me than the list form. I also do not see any reason to believe it will stop people from doing it the quadratic way if the ubiquitous make-a-list-then-join idiom does not. Mike

Mike Graham wrote:
Agreed. Just because the Java idiom is StringBuilder doesn't mean Python should ape it. Python already has a "build strings efficiently" idiom: ''.join(iterable_of_strings) If people can't, or won't, learn this idiom, why would they learn to use StringBuilder instead? -- Steven

Steven D'Aprano, 25.08.2011 16:00:
Plus, StringBuilder is only a special case. Joining a string around other delimiters is straight forward once you've learned about ''.join(). Doing the same with StringBuilder is non-trivial (as the Java example nicely shows). Stefan

On 8/25/2011 5:28 AM, k_bx wrote:
I do not see the need to keep the initial piece separate and do the double join. For Py3 class StringBuilder(object): """Use it instead of doing += for building unicode strings from pieces""" def __init__(self, val=""): self.pieces = [val] def __iadd__(self, item): self.pieces.append(item) return self def __str__(self): val = "".join(self.pieces) self.pieces = [val] return val s = StringBuilder('a') s += 'b' s += 'c' print(s) s += 'd' print(s)
abc abcd
I am personally happy enough with [].append, but I can see the attraction of += if doing many separate lines rather than .append within a loop. -- Terry Jan Reedy

For the record, the "".join() idiom also has its downsides. If you build a list of many tiny strings, memory consumption can grow beyond the reasonable (in one case, building a 600MB JSON string outgrew the RAM of an 8GB machine). One solution is to regularly accumulate the primary list into a secondary accumulation list as done in http://hg.python.org/cpython/rev/47176e8d7060 Regards Antoine. On Thu, 25 Aug 2011 12:28:14 +0300 k_bx <k.bx@ya.ru> wrote:
participants (15)
-
Antoine Pitrou
-
Arnaud Delobelle
-
Carl Matthew Johnson
-
Dirkjan Ochtman
-
Georg Brandl
-
k.bx@ya.ru
-
k_bx
-
Larry Hastings
-
M.-A. Lemburg
-
Masklinn
-
Mike Graham
-
Nick Coghlan
-
Stefan Behnel
-
Steven D'Aprano
-
Terry Reedy