[Python-ideas] Create a StringBuilder class and use it everywhere

k.bx at ya.ru k.bx at ya.ru
Mon Aug 29 18:04:21 CEST 2011


29.08.11, 15:43, "Antoine Pitrou" <solipsis at pitrou.net>:
> 
> On Mon, 29 Aug 2011 11:27:23 +0200
> "M.-A. Lemburg" <mal at egenix.com> wrote:
> > Dirkjan Ochtman wrote:
> > > On Thu, Aug 25, 2011 at 11:45, M.-A. Lemburg <mal at egenix.com> wrote:
> > >> I think you should use cStringIO in your class implementation.
> > >> The list + join idiom is nice, but it has the disadvantage of
> > >> creating and keeping alive many small string objects (with all
> > >> the memory overhead and fragmentation that goes along with it).
> > > 
> > > AFAIK using cStringIO just for string building is much slower than
> > > using list.append() + join(). IIRC we tested some micro-benchmarks on
> > > this for Mercurial output (where it was a significant part of the
> > > profile for some commands). That was on Python 2, of course, it may be
> > > better in io.StringIO and/or Python 3.
> > 
> > Turns our you're right (list.append must have gotten a lot faster
> > since I last tested this years ago, or I simply misremembered
> > the results).
> 
> The join() idiom only does one big copy at the end, while the
> StringIO/BytesIO idiom copies at every resize (unless the memory
> allocator is very smart). Both are O(N) but the join() version
> does less copies and (re)allocations.
> 
> (there are also the list resizings but that object is much smaller)
> 
> Regards
> 
> Antoine.
> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas


Ok, so I think the best approach would be to implement via join + [], but do flush every 1000 ops, since it can save memory.

As for the whole idea -- I still think that creating something like this and adding to stdlib (with __iadd__ and . append() API, which makes refactoring need to be only one string, like doing StringBuilder(u"Foo")) and documenting that would be super-cool.

So who says the last word on this?



More information about the Python-ideas mailing list